spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From SNEHASISH DUTTA <info.snehas...@gmail.com>
Subject CSV use case
Date Wed, 21 Feb 2018 08:53:58 GMT
Hi,

I am using spark 2.2 csv reader

I have data in following format

123|123|"abc"||""|"xyz"

Where || is null
And "" is one blank character as per the requirement

I was using option sep as pipe
And option quote as ""
Parsed the data and using regex I was able to fulfill all the mentioned
conditions.
It started failing when I started column values like this "|" and """ ,
i.e. separator itself has become a column value,quote has become a value in
column and spark started using this value and made extra columns.

After this I used the escape option on "|", but results are similar.

I then tried dataset with split on "\\|" which had similar outcome

Is there any way to resolve this.

Thanks and Regards,
Snehasish

Mime
View raw message