spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Huon Wilson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-26964) to_json/from_json do not match JSON spec due to not supporting scalars
Date Fri, 22 Feb 2019 05:40:00 GMT

    [ https://issues.apache.org/jira/browse/SPARK-26964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16774783#comment-16774783
] 

Huon Wilson commented on SPARK-26964:
-------------------------------------

We wish to store columns as columns within a {{binary}}-based database (HBase), meaning encoding
individual fields. JSON represents a non-horrible way of encoding values into the database,
e.g. it allows handling from many languages/environments (and is even human readable), and
is very convenient to handle with DataFrames. I don't know of another way to extract byte
representations of individual columns that satisfies those constraints.

Looking at the source code, it seems like all of these types have support in JacksonGenerator
and JacksonParser, and so most of the work will be surfacing that, rather than entirely new
code. Is there something you expect to be more intricate than additions to JsonToStructs and
StructsToJson (and tests)? I'm considering having a look at this myself, but if your intuition
implies that this is going to be a dead end, I will not.

> to_json/from_json do not match JSON spec due to not supporting scalars
> ----------------------------------------------------------------------
>
>                 Key: SPARK-26964
>                 URL: https://issues.apache.org/jira/browse/SPARK-26964
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.2, 2.4.0
>            Reporter: Huon Wilson
>            Priority: Major
>
> Spark SQL's {{to_json}} and {{from_json}} currently support arrays and objects, but not
the scalar/primitive types. This doesn't match the JSON spec on https://www.json.org/ or [RFC8259|https://tools.ietf.org/html/rfc8259]:
a JSON document ({{json: element}}) consists of a value surrounded by whitespace ({{element:
ws value ws}}), where a value is an object or array _or_ a number or string etc.:
> {code:none}
> value
>     object
>     array
>     string
>     number
>     "true"
>     "false"
>     "null"
> {code}
> Having {{to_json}} and {{from_json}} support scalars would make them flexible enough
for a library I'm working on, where an arbitrary (user-supplied) column needs to be turned
into JSON.
> NB. these newer specs differ to the original [RFC4627| https://tools.ietf.org/html/rfc4627]
(which is now obsolete) that (essentially) had {{value: object | array}}.
> This is related to SPARK-24391 and SPARK-25252, which added support for arrays of scalars.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message