drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kristine Hahn <kh...@maprtech.com>
Subject Re: Nested or Array JSON
Date Fri, 03 Apr 2015 14:49:26 GMT
You solve the "Needed to be in state INIT or IN_VARCHAR but in mode
IN_BIGINT" by using all_text_mode to resolve the schema differences, as
described in
http://apache.github.io/drill/docs/json-data-model/handling-type-differences.
On my jdbc connection, for example:

> select * from dfs.`/Users/opendata.json` limit 1;
>
>
>> Query failed: Query stopped., Needed to be in state INIT or IN_VARCHAR
>> but in mode IN_BIGINT [ da707fe9-e62c-4e9b-a62a-49b7cab37dfd on
>> 10.0.0.6:31010 ]
>
>
>> . . .
>
>
>> 0: jdbc:drill:zk=local> ALTER SYSTEM SET `store.json.all_text_mode` =
>> true;
>
>
>> +------------+------------+
>
> |     ok     |  summary   |
>
>
>> +------------+------------+
>
> | true       | store.json.all_text_mode updated. |
>
> +------------+------------+
>
>
>> 1 row selected (0.047 seconds)
>
>
>> 0: jdbc:drill:zk=local> select * from dfs.`/Users/opendata.json` limit 1;
>
>
>> +------------+------------+
>
> |    meta    |    data    |
>
> +------------+------------+
>
> | {"view":{"id":"n2rk-fwkj","name":"Unclaimed bank
>> accounts","averageRating":"0","category":"Government","
>
> . . .
>
> Now, how exactly to flatten that big array is another question, answer
TBD.

Kristine Hahn
Sr. Technical Writer
415-497-8107 @krishahn


On Fri, Apr 3, 2015 at 5:41 AM, Muthu Pandi <muthu1086@gmail.com> wrote:

> Tried with the Flatten but the result is same , Kindly help with pointers
>
> "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT
> 100
> [30024]Query execution error. Details:[
> Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
> IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]
>
> ] "
>
>
>
> *RegardsMuthupandi.K*
>
>  Think before you print.
>
>
>
> On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1086@gmail.com> wrote:
>
> > Thankyou Jason for ur detailed answer.
> >
> > Will try to use the Flatten on data column and let u know the status.
> >
> > Error message got from ODBC is
> >
> > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT
> 100
> > [30024]Query execution error. Details:[
> > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
> > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]
> >
> > ] "
> >
> > Is there any way to normalise or convert this nested data to simpler JSON
> > so that i can play with DRILL?
> >
> >
> >
> > *RegardsMuthupandi.K*
> >
> >  Think before you print.
> >
> >
> >
> > On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse <
> altekrusejason@gmail.com>
> > wrote:
> >
> >> To answer Andries' question, with an enhancement in the 0.8 release,
> there
> >> should be no hard limit on the size of Drill records supported. That
> being
> >> said, Drill is not fundamentally set up for processing enormous rows, so
> >> we
> >> do not have a clear idea of the performance impact of working with such
> >> datasets.
> >>
> >> This document is going to be read as a single record originally, and I
> >> think the 0.8 release should be able to read it in. From there, flatten
> >> should be able to produce individual records suitable for further
> >> analysis,
> >> these records will be be a more reasonable size and get you good
> >> performance for further analysis.
> >>
> >> -Jason
> >>
> >> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse <
> altekrusejason@gmail.com
> >> >
> >> wrote:
> >>
> >> > Hi Muthu,
> >> >
> >> > Welcome to the Drill community!
> >> >
> >> > Unfortunately the mailing list does not allow attachments, please send
> >> > along the error log copied into a mail message.
> >> >
> >> > If you are working with the 0.7 version of Drill, I would recommend
> >> > upgrading the the new 0.8 release that just came out, there were a lot
> >> of
> >> > bug fixes and enhancements in the release.
> >> >
> >> > We're glad to hear you have been successful with your previous efforts
> >> > with Drill. Unfortunately Drill is not well suited fro exploring
> >> datasets
> >> > like the one you have linked to. By default Drill supports records of
> >> the
> >> > format accepted by Mongo DB for bulk import, where individual records
> >> take
> >> > the form of a JSON object.
> >> >
> >> > Looking at this dataset, it follows a pattern we have seen before, but
> >> > currently are not well suited for working with in Drill. All of the
> >> data is
> >> > in a single JSON object, at the top of the object are a number of
> >> > dataset-wide metadata fields. These are all nested under a field
> "view",
> >> > with the main data I am guessing you want to analyze nested under the
> >> field
> >> > "data" in an array. While this format is not ideal for Drill, with the
> >> size
> >> > of the dataset you might be able to get it working with an operator in
> >> > Drill that could help make the data more accessible.
> >> >
> >> > The operator is called flatten, and is designed to take an array and
> >> > produce individual records for each element in the array. Optionally
> >> other
> >> > fields from the record can be included alongside each of the newly
> >> spawned
> >> > records to maintain a relationship between the incoming fields in the
> >> > output of flatten.
> >> >
> >> > For more info on flatten, see this page in the wiki:
> >> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function
> >> >
> >> > For this dataset, you might be able to get access to the data simply
> by
> >> > running the following:
> >> >
> >> > select flatten(data) from dfs.`/path/to/file.json`;
> >> >
> >> > If you need to have access to some of the other fields from the top of
> >> the
> >> > dataset, you can include them alongside flatten and they will be
> copied
> >> > into each record produced by the flatten operation:
> >> >
> >> > select flatten(data), view.id, view.category from
> >> > dfs.`/path/to/file.json`;
> >> >
> >> >
> >> >
> >> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1086@gmail.com>
> >> wrote:
> >> >
> >> >> Hi All
> >> >>
> >> >>
> >> >>           Am new to the JSON format and exploring the same. I had
> used
> >> >> Drill to analyse simple JSON files which work like a charm, but am
> not
> >> able
> >> >> to load the this "
> >> >>
> >>
> https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD
> >> "
> >> >>  JSON file for analysis.
> >> >>
> >> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find the
> >> >> attachment for the error.
> >> >>
> >> >>
> >> >>
> >> >> *RegardsMuthupandi.K*
> >> >>
> >> >>  Think before you print.
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message