drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Nested collections (e.g. JSON arrays) and drill queries
Date Fri, 26 Oct 2012 04:47:34 GMT
It it is the within clause that you are interested in, at the physical plan
layer, this is expressed as EXPLODE/AGGREGATE.  Explode creates a batched
data flow which contains values from the nested collection.  The aggregate
injects the results back into the original records.

How this is implemented at the execution layer is more flexible.  The
EXPLODE/AGGREGATE pattern could be recognized and optimized into a loop
that explicitly does the aggregation, especially for well-known aggregates.

On Fri, Oct 26, 2012 at 12:43 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> Does the WITHIN clause help?  In BigQuery, this is described here:
> https://developers.google.com/bigquery/docs/query-reference#within
> On Thu, Oct 25, 2012 at 2:51 PM, Evan Pollan <evan.pollan@gmail.com>wrote:
>> Hi,
>> I attended Tomer's Strata/HadoopWorld presentation on Drill yesterday, and
>> was very impressed.  Lots of features that map directly to my needs.
>> He specifically cited support for, on the HDFS side, JSON/BSON, avro, and
>> sequence files and emphasized the ability to access nested data.  We use
>> JSON heavily, so it sounds like Drill would support base-case queries over
>> nested properties within my dataset.  One question I didn't get the chance
>> to ask, though:  what about querying over records with nested collections?
>>  For example, I have some JSON datasets that look like:
>> {
>>     "propertyA": "valueA",
>>     "propertyB": [
>>         {
>>             "propertyX": "value1",
>>             "propertyY": "value2"
>>         },
>>         {
>>             "propertyX": "value3",
>>             "propertyY": "value4"
>>         }
>>     ]
>> }
>> In this case, I have users that would like to be able to access
>> propertyB.propertyX and leverage it in joins and aggregations.  Since each
>> record has N propertyB.propertyX values, though, I'm wondering how Drill's
>> query planner and execution engine would handle this?
>> thanks,
>> Evan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message