spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taylor Cox <Taylor....@microsoft.com.INVALID>
Subject RE: CSV parser - is there a way to find malformed csv record
Date Tue, 09 Oct 2018 20:40:36 GMT
Hey Nirav,

Here’s an idea:

Suppose your file.csv has N records, one for each line. Read the csv line-by-line (without
spark) and attempt to parse each line. If a record is malformed, catch the exception and rethrow
it with the line number. That should show you where the problematic record(s) can be found.

From: Nirav Patel <npatel@xactlycorp.com>
Sent: Monday, October 8, 2018 11:57 AM
To: spark users <user@spark.apache.org>
Subject: CSV parser - is there a way to find malformed csv record

I am getting `RuntimeException: Malformed CSV record` while parsing csv record and attaching
schema at same time. Most likely there are additional commas or json data in some field which
are not escaped properly. Is there a way CSV parser tells me which record is malformed?


This is what I am using:

    val df2 = sparkSession.read
      .option("inferSchema", true)
      .option("multiLine", true)
      .schema(headerDF.schema) // this only works without column mismatch
      .csv(dataPath)

Thanks



[Image removed by sender. What's New with Xactly]<https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.xactlycorp.com%2Femail-click%2F&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498409102&sdata=Q648xF6kZthiaWtDpXXsy3jSnKT%2FYVF7DFKSp9Mahtk%3D&reserved=0>

[Image removed by sender.]<https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instagram.com%2Fxactlycorp%2F&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498419112&sdata=Rz6ft6lLLRJ9FJVtRMSlKfpKZriwi1yQiiOix0P3PiM%3D&reserved=0>
 [Image removed by sender.] <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fxactly-corporation&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498419112&sdata=htCoZq07XYbOkkB%2Fojwpo4FMTT32LvMsq0%2F8vdp4cD0%3D&reserved=0>
  [Image removed by sender.] <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2FXactly&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498429117&sdata=dx4hY7uwBbthUahdZ%2FlsWPaWBvsBS6zskgOfZj%2BBHCY%3D&reserved=0>
  [Image removed by sender.] <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2FXactlyCorp&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498429117&sdata=KohVt7EXC9P5GiwKKGUMXxvM507o4ZnNozXofMxvn78%3D&reserved=0>
  [Image removed by sender.] <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.youtube.com%2Fxactlycorporation&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498439126&sdata=wphFwmIuci%2BZlrdWYmRdaSOvynU48UmAs0xEFI2BRh0%3D&reserved=0>
Mime
View raw message