spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. Le Bihan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-26968) option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
Date Mon, 25 Feb 2019 14:02:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776903#comment-16776903
] 

M. Le Bihan edited comment on SPARK-26968 at 2/25/19 2:01 PM:
--------------------------------------------------------------

It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do
default conversions.
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable""03142","LENAX",267,43
This issue can be set as a regression if Univocity is unable to do it. Because before, it
was possible. And the issue will be closed when this result could be reached again.

 

Don't close this issue too early please.


was (Author: mlebihan):
It's still a problem, 
I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do
default conversions.

> option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
> ---------------------------------------------------------------------
>
>                 Key: SPARK-26968
>                 URL: https://issues.apache.org/jira/browse/SPARK-26968
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: M. Le Bihan
>            Priority: Minor
>
> I have a CSV to write that has that schema :
> {code:java}
> StructType s = schema.add("codeCommuneCR", StringType, false);
> s = s.add("nomCommuneCR", StringType, false);
> s = s.add("populationCR", IntegerType, false);
> s = s.add("resultatComptable", IntegerType, false);{code}
> If I don't provide an option "_quoteMode_" or even if I set it to {{NON_NUMERIC}}, this
way :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteMode",
"NON_NUMERIC").option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> the CSV written by {{Spark}} is this one :
> {code:java}
> codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
> 03142,LENAX,267,43{code}
> If I set an option "_quoteAll_" instead, like that :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteAll",
true).option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> it generates :
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable" "03142","LENAX","267","43"{code}
> It seems that the {{.option("quoteMode", "NON_NUMERIC")}} is broken. It should generate:
>  
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
> "03142","LENAX",267,43
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message