spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From JG Perrin <>
Subject RE: Problem with CSV line break data in PySpark 2.1.0
Date Tue, 05 Sep 2017 19:06:00 GMT
Have you tried the built-in parser, not the databricks one (which is not really used anymore)?
What is your original CSV looking like?
What is your code looking like? There are quite a few options to read a CSVā€¦

From: Aakash Basu []
Sent: Sunday, September 03, 2017 5:16 AM
To: user <>
Subject: Problem with CSV line break data in PySpark 2.1.0


I've a dataset where a few rows of the column F as shown below have line breaks in CSV file.

[Inline image 1]

When Spark is reading it, it is coming as below, which is a complete new line.

[Inline image 2]

I want my PySpark 2.1.0 to read it by forcefully avoiding the line break after the date, which
is not happening as I am using com.databricks.csv reader. And nulls are getting created after
the date for line 2 for the rest of the columns from G till end.

Can I please be helped how to handle this?


This electronic transmission and any documents accompanying this electronic transmission contain
confidential information belonging to the sender.  This information may contain confidential
health information that is legally privileged.  The information is intended only for the use
of the individual or entity named above.  The authorized recipient of this transmission is
prohibited from disclosing this information to any other party unless required to do so by
law or regulation and is required to delete or destroy the information after its stated need
has been fulfilled.  If you are not the intended recipient, you are hereby notified that any
disclosure, copying, distribution or the taking of any action in reliance on or regarding
the contents of this electronically transmitted information is strictly prohibited.  If you
have received this E-mail in error, please notify the sender and delete this message immediately.
View raw message