spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alessandro Solimando <alessandro.solima...@gmail.com>
Subject Re: Union of multiple data frames
Date Fri, 06 Apr 2018 07:31:27 GMT
Hello Cesar,
can you add some details like: number of columns, avg number of rows in the
DFs, time spent to compute the plan with all the unions, and the time
needed to perform the action?

Thanks,
Alessandro

On 5 April 2018 at 23:22, Cesar <cesar7@gmail.com> wrote:

> Thanks for your answers.
>
> The suggested method works when the number of Data Frames is small.
>
> However, I am trying to union >30 Data Frames, and the time to create the
> plan is taking longer than the execution, which should not be the case.
>
> Thanks!
> --
> Cesar
>
> On Thu, Apr 5, 2018 at 1:29 PM, Andy Davidson <
> Andy@santacruzintegration.com> wrote:
>
>>
>> Hi Ceasar
>>
>> I have used Brandson approach in the past with out any problem
>>
>> Andy
>> From: Brandon Geise <brandongeise@gmail.com>
>> Date: Thursday, April 5, 2018 at 11:23 AM
>> To: Cesar <cesar7@gmail.com>, "user @spark" <user@spark.apache.org>
>> Subject: Re: Union of multiple data frames
>>
>> Maybe something like
>>
>>
>>
>> var finalDF = spark.sqlContext.emptyDataFrame
>>
>> for (df <- dfs){
>>
>>     finalDF = finalDF.union(df)
>>
>> }
>>
>>
>>
>>
>>
>> Where dfs is a Seq of dataframes.
>>
>>
>>
>> *From: *Cesar <cesar7@gmail.com>
>> *Date: *Thursday, April 5, 2018 at 2:17 PM
>> *To: *user <user@spark.apache.org>
>> *Subject: *Union of multiple data frames
>>
>>
>>
>>
>>
>> The following code works for small n, but not for large n (>20):
>>
>>
>>
>> val dfUnion = Seq(df1,df2,df3,...dfn).reduce(_ union _)
>>
>> dfUnion.show()
>>
>>
>>
>> By not working, I mean that Spark takes a lot of time to create the
>> execution plan.
>>
>>
>>
>> *Is there a more optimal way to perform a union of multiple data frames?*
>>
>>
>>
>>
>> thanks
>>
>> --
>>
>> Cesar Flores
>>
>>
>
>
> --
> Cesar Flores
>

Mime
View raw message