spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xinh Huynh <xinh.hu...@gmail.com>
Subject Re: How to make new composite columns by combining rows in the same group?
Date Fri, 26 Aug 2016 15:31:50 GMT
That looks like a pivot table. Have you looked into using the pivot table method with DataFrames?

Xinh

> On Aug 26, 2016, at 4:54 AM, Rex X <dnsring@gmail.com> wrote:
> 
> 1. Given following CSV file
> $cat data.csv
> 
> ID,City,Zip,Price,Rating
> 1,A,95123,100,0
> 1,B,95124,102,1
> 1,A,95126,100,1
> 2,B,95123,200,0
> 2,B,95124,201,1
> 2,C,95124,203,0
> 3,A,95126,300,1
> 3,C,95124,280,0
> 4,C,95124,400,1
> 
> We want to group by ID, and make new composite columns of Price and Rating based on the
value of $City-$Zip. 
> 
> 
> 2. The Expected Result:
> 
> ID	A_95123_Price	 A_95123_Rating	A_95126_Price	A_95126_Rating	B_95123_Price	B_95123_Rating
B_95124_Price	B_95124_Rating	C_95124_Price	C_95124_Rating
> 1	100	1	100	2	0	0	102	2	0	0
> 2	0	0	0	0	200	1	201	2	203	1
> 3	0	0	300	2	0	0	0	0	280	1
> 4	0	0	0	0	0	0	0	0	400	2
> 
> Any tips would be greatly appreciated!
> 
> Thank you.
> 
> Regards,
> Rex
> 

Mime
View raw message