[ https://issues.apache.org/jira/browse/SPARK-11941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032960#comment-15032960
]
Henri DF commented on SPARK-11941:
----------------------------------
I wasn't trying to serialize to/from using the Spark APIs - I was just getting the json representation
out in order to build a programmatic representation of the structtype in another (non-Spark)
environment. Recursing down the tree would be trivial if it was regular, but is painful with
its current layout.
Anyway, with your question I think I better understand the intended use for this, and it does
indeed appear to work fine for ser/deser within Spark. So I get the rationale for making it
an "Improvement". Thanks!
> JSON representation of nested StructTypes could be more uniform
> ---------------------------------------------------------------
>
> Key: SPARK-11941
> URL: https://issues.apache.org/jira/browse/SPARK-11941
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Henri DF
>
> I have a json file with a single row {code}{"a":1, "b": 1.0, "c": "asdfasd", "d":[1,
2, 4]}{code} After reading that file in, the schema is correctly inferred:
> {code}
> scala> df.printSchema
> root
> |-- a: long (nullable = true)
> |-- b: double (nullable = true)
> |-- c: string (nullable = true)
> |-- d: array (nullable = true)
> | |-- element: long (containsNull = true)
> {code}
> However, the json representation has a strange nesting under "type" for column "d":
> {code}
> scala> df.collect()(0).schema.prettyJson
> res60: String =
> {
> "type" : "struct",
> "fields" : [ {
> "name" : "a",
> "type" : "long",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "b",
> "type" : "double",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "c",
> "type" : "string",
> "nullable" : true,
> "metadata" : { }
> }, {
> "name" : "d",
> "type" : {
> "type" : "array",
> "elementType" : "long",
> "containsNull" : true
> },
> "nullable" : true,
> "metadata" : { }
> }]
> }
> {code}
> Specifically, in the last element, "type" is an object instead of being a string. I would
expect the last element to be:
> {code}
> {
> "name":"d",
> "type":"array",
> "elementType":"long",
> "containsNull":true,
> "nullable":true,
> "metadata":{}
> }
> {code}
> There's a similar issue for nested structs.
> (I ran into this while writing node.js bindings, wanted to recurse down this representation,
which would be nicer if it was uniform...).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org
|