drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Altekruse <altekruseja...@gmail.com>
Subject Re: Nested or Array JSON
Date Thu, 02 Apr 2015 15:49:27 GMT
Hi Muthu,

Welcome to the Drill community!

Unfortunately the mailing list does not allow attachments, please send
along the error log copied into a mail message.

If you are working with the 0.7 version of Drill, I would recommend
upgrading the the new 0.8 release that just came out, there were a lot of
bug fixes and enhancements in the release.

We're glad to hear you have been successful with your previous efforts with
Drill. Unfortunately Drill is not well suited fro exploring datasets like
the one you have linked to. By default Drill supports records of the format
accepted by Mongo DB for bulk import, where individual records take the
form of a JSON object.

Looking at this dataset, it follows a pattern we have seen before, but
currently are not well suited for working with in Drill. All of the data is
in a single JSON object, at the top of the object are a number of
dataset-wide metadata fields. These are all nested under a field "view",
with the main data I am guessing you want to analyze nested under the field
"data" in an array. While this format is not ideal for Drill, with the size
of the dataset you might be able to get it working with an operator in
Drill that could help make the data more accessible.

The operator is called flatten, and is designed to take an array and
produce individual records for each element in the array. Optionally other
fields from the record can be included alongside each of the newly spawned
records to maintain a relationship between the incoming fields in the
output of flatten.

For more info on flatten, see this page in the wiki:
https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function

For this dataset, you might be able to get access to the data simply by
running the following:

select flatten(data) from dfs.`/path/to/file.json`;

If you need to have access to some of the other fields from the top of the
dataset, you can include them alongside flatten and they will be copied
into each record produced by the flatten operation:

select flatten(data), view.id, view.category from dfs.`/path/to/file.json`;



On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1086@gmail.com> wrote:

> Hi All
>
>
>           Am new to the JSON format and exploring the same. I had used
> Drill to analyse simple JSON files which work like a charm, but am not able
> to load the this "
> https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD"
>  JSON file for analysis.
>
> Am using ODBC connector to connect to the 0.8 Drill. Kindly find the
> attachment for the error.
>
>
>
> *RegardsMuthupandi.K*
>
>  Think before you print.
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message