spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Henri DF (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-11941) JSON representation of nested StructTypes could be more uniform
Date Tue, 01 Dec 2015 02:15:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032931#comment-15032931
] 

Henri DF edited comment on SPARK-11941 at 12/1/15 2:14 AM:
-----------------------------------------------------------

I think "might be nicer if it was flat' is a bit of an understatement  

The current representation isn't of much use with nested structs. If it's hard to fix, wouldn't
it be better to make this private rather than leave exposed it in its current state? 


was (Author: henridf):
I think "might be nicer if it was flat' is a bit of an understatement  

The current representation isn't of much use with nested structs. If it's hard to fix, would
it be better to remove this than leave it in its current state? 

> JSON representation of nested StructTypes could be more uniform
> ---------------------------------------------------------------
>
>                 Key: SPARK-11941
>                 URL: https://issues.apache.org/jira/browse/SPARK-11941
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Henri DF
>
> I have a json file with a single row {code}{"a":1, "b": 1.0, "c": "asdfasd", "d":[1,
2, 4]}{code} After reading that file in, the schema is correctly inferred:
> {code}
> scala> df.printSchema
> root
>  |-- a: long (nullable = true)
>  |-- b: double (nullable = true)
>  |-- c: string (nullable = true)
>  |-- d: array (nullable = true)
>  |    |-- element: long (containsNull = true)
> {code}
> However, the json representation has a strange nesting under "type" for column "d":
> {code}
> scala> df.collect()(0).schema.prettyJson
> res60: String = 
> {
>   "type" : "struct",
>   "fields" : [ {
>     "name" : "a",
>     "type" : "long",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "b",
>     "type" : "double",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "c",
>     "type" : "string",
>     "nullable" : true,
>     "metadata" : { }
>   }, {
>     "name" : "d",
>     "type" : {
>       "type" : "array",
>       "elementType" : "long",
>       "containsNull" : true
>     },
>     "nullable" : true,
>     "metadata" : { }
>   }]
> }
> {code}
> Specifically, in the last element, "type" is an object instead of being a string. I would
expect the last element to be:
> {code}
>       {
>          "name":"d",
>          "type":"array",
>          "elementType":"long",
>          "containsNull":true,
>          "nullable":true,
>          "metadata":{}
>       }
> {code}
> There's a similar issue for nested structs.
> (I ran into this while writing node.js bindings, wanted to recurse down this representation,
which would be nicer if it was uniform...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message