spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "M. Le Bihan (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-26968) option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
Date Mon, 25 Feb 2019 14:25:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16776903#comment-16776903
] 

M. Le Bihan edited comment on SPARK-26968 at 2/25/19 2:24 PM:
--------------------------------------------------------------

It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do
default conversions.

 
{code:java}
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
"03142","LENAX",267,43{code}
This issue can be set as a regression if Univocity is unable to do it. Because before, it
was possible. And the issue will be closed when this result could be reached again.

 

Don't close this issue too early please.

 

P.S. : Adding to that, I don't understand why databricks would keep previous CSV system, as
it is shown here on master branch [on line 504 of this unit test|https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala]
still using and checking the results of NON_NUMERIC especially,

and have been exchanged with _Univocity_ in spark_core or spark_sql, without checking that
it keeps abilities to give all the same results than before ?


was (Author: mlebihan):
It's still a problem, 
 I see no equivalent with Univocity to obtain the result I expect, which is  :

String values surrounded by quotes
 But the numeric values, not.

Else, the classic importation of that CSV in an Excel or OpenCalc program cannot easily do
default conversions.

 
{code:java}
"codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
"03142","LENAX",267,43{code}
This issue can be set as a regression if Univocity is unable to do it. Because before, it
was possible. And the issue will be closed when this result could be reached again.

 

Don't close this issue too early please.

 

P.S. : Adding to that, I don't understand why databricks would keep previous CSV system, as
it is shown here on master branch :

[https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala|https://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scalahttps://github.com/databricks/spark-csv/blob/master/src/test/scala/com/databricks/spark/csv/CsvSuite.scala]

with the unit test on line 504,

and have been exchanged with _Univocity_ in spark_core or spark_sql, without checking that
it keeps abilities to give all the same results than before ?

> option("quoteMode", "NON_NUMERIC") have no effect on a CSV generation
> ---------------------------------------------------------------------
>
>                 Key: SPARK-26968
>                 URL: https://issues.apache.org/jira/browse/SPARK-26968
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: M. Le Bihan
>            Priority: Minor
>
> I have a CSV to write that has that schema :
> {code:java}
> StructType s = schema.add("codeCommuneCR", StringType, false);
> s = s.add("nomCommuneCR", StringType, false);
> s = s.add("populationCR", IntegerType, false);
> s = s.add("resultatComptable", IntegerType, false);{code}
> If I don't provide an option "_quoteMode_" or even if I set it to {{NON_NUMERIC}}, this
way :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteMode",
"NON_NUMERIC").option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> the CSV written by {{Spark}} is this one :
> {code:java}
> codeCommuneCR,nomCommuneCR,populationCR,resultatComptable
> 03142,LENAX,267,43{code}
> If I set an option "_quoteAll_" instead, like that :
> {code:java}
> ds.coalesce(1).write().mode(SaveMode.Overwrite) .option("header", "true") .option("quoteAll",
true).option("quote", "\"") .csv("./target/out_200071470.csv");{code}
> it generates :
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable" "03142","LENAX","267","43"{code}
> It seems that the {{.option("quoteMode", "NON_NUMERIC")}} is broken. It should generate:
>  
> {code:java}
> "codeCommuneCR","nomCommuneCR","populationCR","resultatComptable"
> "03142","LENAX",267,43
> {code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message