spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <cjno...@gmail.com>
Subject Re: MapType vs StructType
Date Fri, 17 Jul 2015 20:41:36 GMT
This helps immensely. Thanks Michael!

On Fri, Jul 17, 2015 at 4:33 PM, Michael Armbrust <michael@databricks.com>
wrote:

> I'll add there is a JIRA to override the default past some threshold of #
> of unique keys: https://issues.apache.org/jira/browse/SPARK-4476
> <https://issues.apache.org/jira/browse/SPARK-4476>
>
> On Fri, Jul 17, 2015 at 1:32 PM, Michael Armbrust <michael@databricks.com>
> wrote:
>
>> The difference between a map and a struct here is that in a struct all
>> possible keys are defined as part of the schema and can each can have a
>> different type (and we don't support union types).  JSON doesn't have
>> differentiated data structures so we go with the one that gives you more
>> information when doing inference by default.  If you pass in a schema to
>> JSON however, you can override this and have a JSON object parsed as a map.
>>
>> On Fri, Jul 17, 2015 at 11:02 AM, Corey Nolet <cjnolet@gmail.com> wrote:
>>
>>> I notice JSON objects are all parsed as Map[String,Any] in Jackson but
>>> for some reason, the "inferSchema" tools in Spark SQL extracts the schema
>>> of nested JSON objects as StructTypes.
>>>
>>> This makes it really confusing when trying to rectify the object
>>> hierarchy when I have maps because the Catalyst conversion layer underneath
>>> is expecting a Row or Product and not a Map.
>>>
>>> Why wasn't MapType used here? Is there any significant difference
>>> between the two of these types that would cause me not to use a MapType
>>> when I'm constructing my own schema representing a set of nested
>>> Map[String,_]'s?
>>>
>>>
>>>
>>>
>>
>

Mime
View raw message