spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rex X <dnsr...@gmail.com>
Subject How to make new composite columns by combining rows in the same group?
Date Fri, 26 Aug 2016 11:54:36 GMT
1. Given following CSV file

$cat data.csv

ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1


We want to group by ID, and make new composite columns of Price and Rating
based on the value of $City-$Zip.


2. The Expected Result:















  ID
  A_95123_Price
   A_95123_Rating
  A_95126_Price
  A_95126_Rating
  B_95123_Price
  B_95123_Rating
  B_95124_Price
  B_95124_Rating
  C_95124_Price
  C_95124_Rating


  1
  100
  1
  100
  2
  0
  0
  102
  2
  0
  0


  2
  0
  0
  0
  0
  200
  1
  201
  2
  203
  1


  3
  0
  0
  300
  2
  0
  0
  0
  0
  280
  1


  4
  0
  0
  0
  0
  0
  0
  0
  0
  400
  2

Any tips would be greatly appreciated!

Thank you.

Regards,
Rex

Mime
View raw message