drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Evan Pollan <evan.pol...@gmail.com>
Subject Nested collections (e.g. JSON arrays) and drill queries
Date Thu, 25 Oct 2012 18:51:20 GMT

I attended Tomer's Strata/HadoopWorld presentation on Drill yesterday, and
was very impressed.  Lots of features that map directly to my needs.

He specifically cited support for, on the HDFS side, JSON/BSON, avro, and
sequence files and emphasized the ability to access nested data.  We use
JSON heavily, so it sounds like Drill would support base-case queries over
nested properties within my dataset.  One question I didn't get the chance
to ask, though:  what about querying over records with nested collections?
 For example, I have some JSON datasets that look like:

    "propertyA": "valueA",
    "propertyB": [
            "propertyX": "value1",
            "propertyY": "value2"
            "propertyX": "value3",
            "propertyY": "value4"

In this case, I have users that would like to be able to access
propertyB.propertyX and leverage it in joins and aggregations.  Since each
record has N propertyB.propertyX values, though, I'm wondering how Drill's
query planner and execution engine would handle this?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message