spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aakash Basu <aakash.spark....@gmail.com>
Subject Problem with CSV line break data in PySpark 2.1.0
Date Sun, 03 Sep 2017 10:15:59 GMT
Hi,

I've a dataset where a few rows of the column F as shown below have line
breaks in CSV file.

[image: Inline image 1]

When Spark is reading it, it is coming as below, which is a complete new
line.

[image: Inline image 2]

I want my PySpark 2.1.0 to read it by forcefully avoiding the line break
after the date, which is not happening as I am using com.databricks.csv
reader. And nulls are getting created after the date for line 2 for the
rest of the columns from G till end.

Can I please be helped how to handle this?

Thanks,
Aakash.

Mime
View raw message