drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: JSON reader enhancement
Date Sun, 19 Nov 2017 03:33:58 GMT
Hi Arina,

The proposal is to represent 2D arrays as a string (using the original, unparsed JSON.) That
is, given this input:

{a: “fred”, b: [[10, 20, 30], [11, 21, 31]]}

The parsed columns are:

a, b
“fred”, "[[10, 20, 30], [11, 21, 31]]”

Notice that column b is just a string. It is a string of JSON, yes, but still just a string.

So, the question about kvgen/flatten does not apply here since we are not creating a Drill
array.

There is a very interesting discussion to be had about how Drill does/should handle “non-relational”
JSON structures. But, here, the suggestions is just for one very simple special case.

Thanks,

- Paul

> On Nov 18, 2017, at 7:15 AM, Arina Yelchiyeva <arina.yelchiyeva@gmail.com> wrote:
> 
> In general sounds good.
> If user will apply kvgen / flatten over such 2-D array columns read as
> string, he will be able to normalize data in the format he wants? Right? Or
> we need to come up with new function?
> 
> Kind regards
> Arina
> 
> On Fri, Nov 17, 2017 at 10:39 PM, Paul Rogers <progers@mapr.com> wrote:
> 
>> Hi All,
>> 
>> I’d like to propose a minor enhancement to the JSON reader to better
>> handle non-relational JSON structures. (See DRILL-5974 [1].)
>> 
>> As background, Drill handles simple tuples:
>> 
>> {a: 10, b: “fred”}
>> 
>> Drill also handles arrays:
>> 
>> {name: “fred”, hobbies: [“bowling”, “golf”]}
>> 
>> Drill even handles arrays of tuples:
>> 
>> {name: “fred”, orders: [
>>  {id: 1001, amount: 12.34},
>>  {id: 1002, amount: 56.78}]}
>> 
>> The above are termed "relational" because there is a straightforward
>> mapping to/from tables into the above JSON structures.
>> 
>> Things get interesting with non-relational types, such as 2-D arrays:
>> 
>> {id: 4, shape: “square”, points: [[0, 0], [0, 5], [5, 0], [5, 5]]}
>> 
>> Drill has two solutions:
>> 
>> * Turn on the experimental list and union support.
>> * Enable all-text mode to read all fields as JSON text.
>> 
>> Here, I’d like to propose a middle ground:
>> 
>> * Read fields with relational types into vectors.
>> * Read non-relational fields using text mode.
>> 
>> Thus, the first three examples would all result in the JSON data parsed
>> into Drill vectors. But, the fourth, non-relational example would produce a
>> row that looks like this:
>> 
>> id, shape, points
>> 4, “shape”, “[[0, 0], [0, 5], [5, 0], [5, 5]]”
>> 
>> Although Drill can’t parse the 2-D array, Drill will pass the array along
>> to the client, which can use its favorite JSON parser to parse the array
>> and do something useful (like draw the square in this case.)
>> 
>> In particular, the proposal:
>> 
>> * Apply this change only to the revised “batch size aware” JSON reader.
>> * Use the above parsing model by default.
>> * Use the experimental list-and-union support if the existing
>> `exec.enable_union_type` system/session option is set.
>> 
>> Existing queries should “just work.” In fact, now JSON with non-relational
>> types will work “out-of-the-box” without all-text mode or the experimental
>> types.
>> 
>> Thoughts?
>> 
>> - Paul
>> 
>> [1] https://issues.apache.org/jira/browse/DRILL-5974
>> 
>> 
>> 

Mime
View raw message