spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Geise <>
Subject Re: Union of multiple data frames
Date Thu, 05 Apr 2018 18:23:21 GMT
Maybe something like


var finalDF = spark.sqlContext.emptyDataFrame

for (df <- dfs){

    finalDF = finalDF.union(df)




Where dfs is a Seq of dataframes.


From: Cesar <>
Date: Thursday, April 5, 2018 at 2:17 PM
To: user <>
Subject: Union of multiple data frames



The following code works for small n, but not for large n (>20):


val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)


By not working, I mean that Spark takes a lot of time to create the execution plan.


Is there a more optimal way to perform a union of multiple data frames?




Cesar Flores

View raw message