spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corey Nolet <>
Subject Re: MapType vs StructType
Date Fri, 17 Jul 2015 20:41:36 GMT
This helps immensely. Thanks Michael!

On Fri, Jul 17, 2015 at 4:33 PM, Michael Armbrust <>

> I'll add there is a JIRA to override the default past some threshold of #
> of unique keys:
> <>
> On Fri, Jul 17, 2015 at 1:32 PM, Michael Armbrust <>
> wrote:
>> The difference between a map and a struct here is that in a struct all
>> possible keys are defined as part of the schema and can each can have a
>> different type (and we don't support union types).  JSON doesn't have
>> differentiated data structures so we go with the one that gives you more
>> information when doing inference by default.  If you pass in a schema to
>> JSON however, you can override this and have a JSON object parsed as a map.
>> On Fri, Jul 17, 2015 at 11:02 AM, Corey Nolet <> wrote:
>>> I notice JSON objects are all parsed as Map[String,Any] in Jackson but
>>> for some reason, the "inferSchema" tools in Spark SQL extracts the schema
>>> of nested JSON objects as StructTypes.
>>> This makes it really confusing when trying to rectify the object
>>> hierarchy when I have maps because the Catalyst conversion layer underneath
>>> is expecting a Row or Product and not a Map.
>>> Why wasn't MapType used here? Is there any significant difference
>>> between the two of these types that would cause me not to use a MapType
>>> when I'm constructing my own schema representing a set of nested
>>> Map[String,_]'s?

View raw message