spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cesar <ces...@gmail.com>
Subject Re: Union of multiple data frames
Date Thu, 05 Apr 2018 21:22:39 GMT
Thanks for your answers.

The suggested method works when the number of Data Frames is small.

However, I am trying to union >30 Data Frames, and the time to create the
plan is taking longer than the execution, which should not be the case.

Thanks!
--
Cesar

On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <Andy@santacruzintegration.com
> wrote:

>
> Hi Ceasar
>
> I have used Brandson approach in the past with out any problem
>
> Andy
> From: Brandon Geise <brandongeise@gmail.com>
> Date: Thursday, April 5, 2018 at 11:23 AM
> To: Cesar <cesar7@gmail.com>, "user @spark" <user@spark.apache.org>
> Subject: Re: Union of multiple data frames
>
> Maybe something like
>
>
>
> var finalDF = spark.sqlContext.emptyDataFrame
>
> for (df <- dfs){
>
>     finalDF = finalDF.union(df)
>
> }
>
>
>
>
>
> Where dfs is a Seq of dataframes.
>
>
>
> *From: *Cesar <cesar7@gmail.com>
> *Date: *Thursday, April 5, 2018 at 2:17 PM
> *To: *user <user@spark.apache.org>
> *Subject: *Union of multiple data frames
>
>
>
>
>
> The following code works for small n, but not for large n (>20):
>
>
>
> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
>
> dfUnion.show()
>
>
>
> By not working, I mean that Spark takes a lot of time to create the
> execution plan.
>
>
>
> *Is there a more optimal way to perform a union of multiple data frames?*
>
>
>
>
> thanks
>
> --
>
> Cesar Flores
>
>


-- 
Cesar Flores

Mime
View raw message