spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Koert Kuipers <ko...@tresata.com>
Subject Re: CSV escaping not working
Date Thu, 27 Oct 2016 18:17:44 GMT
i can see how unquoted csv would work if you escape delimiters, but i have
never seen that in practice.

On Thu, Oct 27, 2016 at 2:03 PM, Jain, Nishit <njain1@underarmour.com>
wrote:

> I’d think quoting is only necessary if you are not escaping delimiters in
> data. But we can only share our opinions. It would be good to see something
> documented.
> This may be the cause of the issue?: https://issues.apache.
> org/jira/browse/CSV-135
>
> From: Koert Kuipers <koert@tresata.com>
> Date: Thursday, October 27, 2016 at 12:49 PM
>
> To: "Jain, Nishit" <njain1@underarmour.com>
> Cc: "user@spark.apache.org" <user@spark.apache.org>
> Subject: Re: CSV escaping not working
>
> well my expectation would be that if you have delimiters in your data you
> need to quote your values. if you now have quotes without your data you
> need to escape them.
>
> so escaping is only necessary if quoted.
>
> On Thu, Oct 27, 2016 at 1:45 PM, Jain, Nishit <njain1@underarmour.com>
> wrote:
>
>> Do you mind sharing why should escaping not work without quotes?
>>
>> From: Koert Kuipers <koert@tresata.com>
>> Date: Thursday, October 27, 2016 at 12:40 PM
>> To: "Jain, Nishit" <njain1@underarmour.com>
>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>> Subject: Re: CSV escaping not working
>>
>> that is what i would expect: escaping only works if quoted
>>
>> On Thu, Oct 27, 2016 at 1:24 PM, Jain, Nishit <njain1@underarmour.com>
>> wrote:
>>
>>> Interesting finding: Escaping works if data is quoted but not otherwise.
>>>
>>> From: "Jain, Nishit" <njain1@underarmour.com>
>>> Date: Thursday, October 27, 2016 at 10:54 AM
>>> To: "user@spark.apache.org" <user@spark.apache.org>
>>> Subject: CSV escaping not working
>>>
>>> I am using spark-core version 2.0.1 with Scala 2.11. I have simple code
>>> to read a csv file which has \ escapes.
>>>
>>> val myDA = spark.read
>>>       .option("quote",null)
>>>     .schema(mySchema)
>>>     .csv(filePath)
>>>
>>> As per documentation \ is default escape for csv reader. But it does not
>>> work. Spark is reading \ as part of my data. For Ex: City column in csv
>>> file is *north rocks\,au* . I am expecting city column should read in
>>> code as *northrocks,au*. But instead spark reads it as *northrocks\* and
>>> moves *au* to next column.
>>>
>>> I have tried following but did not work:
>>>
>>>    - Explicitly defined escape .option("escape",”\\")
>>>    - Changed escape to | or : in file and in code
>>>    - I have tried using spark-csv library
>>>
>>> Any one facing same issue? Am I missing something?
>>>
>>> Thanks
>>>
>>
>>
>

Mime
View raw message