spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Davidson <A...@SantaCruzIntegration.com>
Subject Re: Union of multiple data frames
Date Thu, 05 Apr 2018 18:29:24 GMT

Hi Ceasar

I have used Brandson approach in the past with out any problem

Andy
From:  Brandon Geise <brandongeise@gmail.com>
Date:  Thursday, April 5, 2018 at 11:23 AM
To:  Cesar <cesar7@gmail.com>, "user @spark" <user@spark.apache.org>
Subject:  Re: Union of multiple data frames

> Maybe something like
>  
> var finalDF = spark.sqlContext.emptyDataFrame
> for (df <- dfs){
>     finalDF = finalDF.union(df)
> }
>  
>  
> Where dfs is a Seq of dataframes.
>  
> 
> From: Cesar <cesar7@gmail.com>
> Date: Thursday, April 5, 2018 at 2:17 PM
> To: user <user@spark.apache.org>
> Subject: Union of multiple data frames
> 
>  
> 
>  
> 
> The following code works for small n, but not for large n (>20):
> 
>  
> 
> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
> 
> dfUnion.show()
> 
>  
> 
> By not working, I mean that Spark takes a lot of time to create the execution
> plan.
> 
>  
> 
> Is there a more optimal way to perform a union of multiple data frames?
> 
>  
> 
> thanks
> -- 
> 
> Cesar Flores



Mime
View raw message