spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Debabrata Ghosh <mailford...@gmail.com>
Subject How to flatten a row in PySpark
Date Thu, 12 Oct 2017 16:09:31 GMT
Hi,
        Greetings !

I am having data in the format of the following row:

ABZ|ABZ|AF|2,3,7,8,B,C,D,E,J,K,L,M,P,Q,T,U,X,Y|1,2,3,4,5|730

I want to convert it into several rows in the format below:

ABZ|ABZ|AF|2|1|730
ABZ|ABZ|AF|3+1|730
.
.
.
ABZ|ABZ|AF|3|1|730
ABZ|ABZ|AF|3|2|730
ABZ|ABZ|AF|3|3|730
.
.
.
ABZ|ABZ|AF|Y|4|730
ABZ|ABZ|AF||Y|5|730

Basically, I want to consider the various combinations of the 4th and 5th
columns (where the values are delimited by commas) and accordingly generate
the above rows from a single row. Please can you suggest me for a good way
of acheiving this. Thanks in advance !

Regards,

Debu

Mime
View raw message