spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nirav Patel <npa...@xactlycorp.com>
Subject Re: CSV parser - is there a way to find malformed csv record
Date Tue, 09 Oct 2018 22:09:44 GMT
Thanks Shuporno . That mode worked. I found out couple records having
quotes inside quotes which needed to be escaped.



On Tue, Oct 9, 2018 at 1:40 PM Taylor Cox <Taylor.Cox@microsoft.com> wrote:

> Hey Nirav,
>
>
>
> Here’s an idea:
>
>
>
> Suppose your file.csv has N records, one for each line. Read the csv
> line-by-line (without spark) and attempt to parse each line. If a record is
> malformed, catch the exception and rethrow it with the line number. That
> should show you where the problematic record(s) can be found.
>
>
>
> *From:* Nirav Patel <npatel@xactlycorp.com>
> *Sent:* Monday, October 8, 2018 11:57 AM
> *To:* spark users <user@spark.apache.org>
> *Subject:* CSV parser - is there a way to find malformed csv record
>
>
>
> I am getting `RuntimeException: Malformed CSV record` while parsing csv
> record and attaching schema at same time. Most likely there are additional
> commas or json data in some field which are not escaped properly. Is there
> a way CSV parser tells me which record is malformed?
>
>
>
>
>
> This is what I am using:
>
>
>
>     val df2 = sparkSession.read
>
>       .option("inferSchema", true)
>
>       .option("multiLine", true)
>
>       .schema(headerDF.schema) // this only works without column mismatch
>
>       .csv(dataPath)
>
>
>
> Thanks
>
>
>
>
> [image: Image removed by sender. What's New with Xactly]
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.xactlycorp.com%2Femail-click%2F&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498409102&sdata=Q648xF6kZthiaWtDpXXsy3jSnKT%2FYVF7DFKSp9Mahtk%3D&reserved=0>
>
> [image: Image removed by sender.]
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.instagram.com%2Fxactlycorp%2F&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498419112&sdata=Rz6ft6lLLRJ9FJVtRMSlKfpKZriwi1yQiiOix0P3PiM%3D&reserved=0>
>   [image: Image removed by sender.]
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fxactly-corporation&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498419112&sdata=htCoZq07XYbOkkB%2Fojwpo4FMTT32LvMsq0%2F8vdp4cD0%3D&reserved=0>
>   [image: Image removed by sender.]
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Ftwitter.com%2FXactly&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498429117&sdata=dx4hY7uwBbthUahdZ%2FlsWPaWBvsBS6zskgOfZj%2BBHCY%3D&reserved=0>
>   [image: Image removed by sender.]
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.facebook.com%2FXactlyCorp&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498429117&sdata=KohVt7EXC9P5GiwKKGUMXxvM507o4ZnNozXofMxvn78%3D&reserved=0>
>   [image: Image removed by sender.]
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.youtube.com%2Fxactlycorporation&data=02%7C01%7CTaylor.Cox%40microsoft.com%7C99917500d9d546c8bef308d62d4fe469%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636746218498439126&sdata=wphFwmIuci%2BZlrdWYmRdaSOvynU48UmAs0xEFI2BRh0%3D&reserved=0>
>

-- 


 <http://www.xactlycorp.com/email-click/>

 
<https://www.instagram.com/xactlycorp/>   
<https://www.linkedin.com/company/xactly-corporation>   
<https://twitter.com/Xactly>   <https://www.facebook.com/XactlyCorp>   
<http://www.youtube.com/xactlycorporation>

Mime
View raw message