spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject Re: How to make new composite columns by combining rows in the same group?
Date Fri, 26 Aug 2016 11:56:53 GMT
The data.csv need to be corrected:


1. Given following CSV file
$cat data.csv

ID,City,Zip,Price,Rating
1,A,95123,100,1
1,B,95124,102,2
1,A,95126,100,2
2,B,95123,200,1
2,B,95124,201,2
2,C,95124,203,1
3,A,95126,300,2
3,C,95124,280,1
4,C,95124,400,2


On Fri, Aug 26, 2016 at 4:54 AM, Rex X <dnsring@gmail.com> wrote:

> 1. Given following CSV file
>
> $cat data.csv
>
> ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1
>
>
> We want to group by ID, and make new composite columns of Price and Rating
> based on the value of $City-$Zip.
>
>
> 2. The Expected Result:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   ID
>   A_95123_Price
>    A_95123_Rating
>   A_95126_Price
>   A_95126_Rating
>   B_95123_Price
>   B_95123_Rating
>   B_95124_Price
>   B_95124_Rating
>   C_95124_Price
>   C_95124_Rating
>
>
>   1
>   100
>   1
>   100
>   2
>   0
>   0
>   102
>   2
>   0
>   0
>
>
>   2
>   0
>   0
>   0
>   0
>   200
>   1
>   201
>   2
>   203
>   1
>
>
>   3
>   0
>   0
>   300
>   2
>   0
>   0
>   0
>   0
>   280
>   1
>
>
>   4
>   0
>   0
>   0
>   0
>   0
>   0
>   0
>   0
>   400
>   2
>
> Any tips would be greatly appreciated!
>
> Thank you.
>
> Regards,
> Rex
>
>

Mime
View raw message