spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean Georges Perrin <...@jgp.net>
Subject Re: How to merge multiple rows
Date Wed, 22 Aug 2018 20:12:28 GMT
How do you do it now? 

You could use a withColumn(“newDetails”, <concatenation of details_1, details_2...>)


jg


> On Aug 22, 2018, at 16:04, msbreuer <msbreuer@gmail.com> wrote:
> 
> A dataframe with following contents is given:
> 
> ID PART DETAILS
> 1    1 A1
> 1    2 A2
> 1    3 A3
> 2    1 B1
> 3    1 C1
> 
> Target format should be as following:
> 
> ID DETAILS
> 1 A1+A2+A3
> 2 B1
> 3 C1
> 
> Note, the order of A1-3 is important.
> 
> Currently I am using this alternative:
> 
> ID DETAIL_1 DETAIL_2 DETAIL_3
> 1 A1       A2       A3
> 2 B1
> 3 C1
> 
> What would be the best method to do such transformation an a large dataset?
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message