drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Parth Chandra <par...@apache.org>
Subject Re: best approach for complex, several levels of json nesting
Date Wed, 08 Jun 2016 18:55:13 GMT
Well, your best bets are still JSON and Parquet. Parquet is more compact
and therefore likely to be faster. Internally, Drill will keep nested data
as a nested type and will not flatten it out unless you want it to. Even
with the nested structure, you can refer to individual fields without
having to flatten the data out.
If you want the nested structures to be flattened, then you will need to
use FLATTEN and KVGEN. With multiple levels you will end up with fairly
complex queries as you will need to unravel one level at a time in a
subquery. The usual way people achieve this is by creating views for each
subquery.

On Wed, Jun 8, 2016 at 10:36 AM, Scott Kinney <scott.kinney@stem.com> wrote:

> We have lots a different json structures gzipped in s3 that we want to
> query (currently looking at Drill and Druid). What is the best approach for
> getting this into a queryable format for drill? I tried
> FLATTEN(KVGEN(data)) but since out structures are often nested multiple
> levels this doesn't work. ?We have also converted to parquet but when i run
> drill on a parquet file the structure isn't getting flattened either.
>
> What is the best approach for this situation?
>
> Thanks all,
>
>
>
> ________________________________
> Scott Kinney | DevOps
> stem <http://www.stem.com/>   |   m  510.282.1299
> 100 Rollins Road, Millbrae, California 94030
>
> This e-mail and/or any attachments contain Stem, Inc. confidential and
> proprietary information and material for the sole use of the intended
> recipient(s). Any review, use or distribution that has not been expressly
> authorized by Stem, Inc. is strictly prohibited. If you are not the
> intended recipient, please contact the sender and delete all copies. Thank
> you.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message