spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cazen Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-12537) Add option to accept quoting of all character backslash quoting mechanism
Date Sat, 02 Jan 2016 13:03:39 GMT

    [ https://issues.apache.org/jira/browse/SPARK-12537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15076510#comment-15076510
] 

Cazen Lee commented on SPARK-12537:
-----------------------------------

Happy New Year!
The situation seemed to require further discussion
Tell me what I can to help on this issue
Thank you

> Add option to accept quoting of all character backslash quoting mechanism
> -------------------------------------------------------------------------
>
>                 Key: SPARK-12537
>                 URL: https://issues.apache.org/jira/browse/SPARK-12537
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.5.2
>            Reporter: Cazen Lee
>            Assignee: Apache Spark
>
> We can provides the option to choose JSON parser can be enabled to accept quoting of
all character or not.
> For example, if JSON file that includes not listed by JSON backslash quoting specification,
it returns corrupt_record
> {code:title=JSON File|borderStyle=solid}
> {"name": "Cazen Lee", "price": "$10"}
> {"name": "John Doe", "price": "\$20"}
> {"name": "Tracy", "price": "$10"}
> {code}
> corrupt_record(returns null)
> {code}
> scala> df.show
> +--------------------+---------+-----+
> |     _corrupt_record|     name|price|
> +--------------------+---------+-----+
> |                null|Cazen Lee|  $10|
> |{"name": "John Do...|     null| null|
> |                null|    Tracy|  $10|
> +--------------------+---------+-----+
> {code}
> And after apply this patch, we can enable allowBackslashEscapingAnyCharacter option like
below
> {code}
> scala> val df = sqlContext.read.option("allowBackslashEscapingAnyCharacter", "true").json("/user/Cazen/test/test2.txt")
> df: org.apache.spark.sql.DataFrame = [name: string, price: string]
> scala> df.show
> +---------+-----+
> |     name|price|
> +---------+-----+
> |Cazen Lee|  $10|
> | John Doe|  $20|
> |    Tracy|  $10|
> +---------+-----+
> {code}
> This issue similar to HIVE-11825, HIVE-12717.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message