spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Geise <brandonge...@gmail.com>
Subject Re: Union of multiple data frames
Date Thu, 05 Apr 2018 18:23:21 GMT
Maybe something like

 

var finalDF = spark.sqlContext.emptyDataFrame

for (df <- dfs){

    finalDF = finalDF.union(df)

}

 

 

Where dfs is a Seq of dataframes.

 

From: Cesar <cesar7@gmail.com>
Date: Thursday, April 5, 2018 at 2:17 PM
To: user <user@spark.apache.org>
Subject: Union of multiple data frames

 

 

The following code works for small n, but not for large n (>20):

 

val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)

dfUnion.show()

 

By not working, I mean that Spark takes a lot of time to create the execution plan.

 

Is there a more optimal way to perform a union of multiple data frames?


 

thanks

-- 

Cesar Flores


Mime
View raw message