drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muthu Pandi <muthu1...@gmail.com>
Subject Re: Nested or Array JSON
Date Sat, 04 Apr 2015 07:28:54 GMT
Thankyou Kristin and Jason for the pointers. Am able to Flatten the data
and begin to query.

One thing is that after  setting this  ALTER SYSTEM SET
`store.json.all_text_mode` =true; i got error as this

"ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
SELECT * FROM `HDFS`.`default`.`./user/hadoop2/unclaimedaccount.json` LIMIT
100
[30024]Query execution error. Details:[
Query stopped., Invalid UTF-8 start byte 0x96
 at [Source: org.apache.drill.exec.vector.complex.fn.JsonReader@3e1a1569;
line: 5221, column: 120] [ 6eaaccba-6992-43ec-8e4a-016d8e3c486f on
nn01:31010 ]

]"

When i dig in to that data i found UTF-8 character which is not recognized
after removing the below line am able to use the data.

[ 4527, "5F67D9EC-51E3-46D6-8384-3107170CE049", 4527, 1279730940, "388535",
1279730940, "388535", null,* "WILLINGDON � 1980 HOMECOMING *         ",
null, "217.83", "C O  K EWONIAK 12527 52A AVE ", "EDMONTON ",
"1983-03-01T12:00:00", "CANADIAN IMPERIAL BANK OF COMMERCE" ]

Is there anything which i can do to include this data as well.





*RegardsMuthupandi.K*

 Think before you print.



On Fri, Apr 3, 2015 at 11:00 PM, Kristine Hahn <khahn@maprtech.com> wrote:

> Here are some examples of queries on the actual data you are using:
>
> This query extracts some data from the "meta" map.
>
> select t.meta.`view`.`id` from
> dfs.`/Users/khahn/Documents/test_files_source/opendata.json` t;
>
> > +------------+
> >
> > |   EXPR$0   |
> >
> > +------------+
> >
> > | n2rk-fwkj  |
> >
> > +------------+
> >
> > 1 row selected (0.102 seconds)
> >
> > As shown here, you need to use a table alias when querying complex data
> per the docs. The back ticks are used to escape a reserved word, such as
> view. Maybe id is reserved too.
>
> This query extracts some data from the "data" array at the bottom of the
> file:
>
> select data[2][1] from
> dfs.`/Users/khahn/Documents/test_files_source/opendata.json`;
>
> > +------------+
> >
> > |   EXPR$0   |
> >
> > +------------+
> >
> > | 8CDB805D-1C8D-434D-AC5B-2D8130F48841 |
> >
> > +------------+
> >
> >
>
>
>
>
>
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn
>
>
> On Fri, Apr 3, 2015 at 5:41 AM, Muthu Pandi <muthu1086@gmail.com> wrote:
>
> > Tried with the Flatten but the result is same , Kindly help with pointers
> >
> > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT
> > 100
> > [30024]Query execution error. Details:[
> > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
> > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]
> >
> > ] "
> >
> >
> >
> > *RegardsMuthupandi.K*
> >
> >  Think before you print.
> >
> >
> >
> > On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1086@gmail.com>
> wrote:
> >
> > > Thankyou Jason for ur detailed answer.
> > >
> > > Will try to use the Flatten on data column and let u know the status.
> > >
> > > Error message got from ODBC is
> > >
> > > "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> > > SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json`
> LIMIT
> > 100
> > > [30024]Query execution error. Details:[
> > > Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
> > > IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]
> > >
> > > ] "
> > >
> > > Is there any way to normalise or convert this nested data to simpler
> JSON
> > > so that i can play with DRILL?
> > >
> > >
> > >
> > > *RegardsMuthupandi.K*
> > >
> > >  Think before you print.
> > >
> > >
> > >
> > > On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse <
> > altekrusejason@gmail.com>
> > > wrote:
> > >
> > >> To answer Andries' question, with an enhancement in the 0.8 release,
> > there
> > >> should be no hard limit on the size of Drill records supported. That
> > being
> > >> said, Drill is not fundamentally set up for processing enormous rows,
> so
> > >> we
> > >> do not have a clear idea of the performance impact of working with
> such
> > >> datasets.
> > >>
> > >> This document is going to be read as a single record originally, and I
> > >> think the 0.8 release should be able to read it in. From there,
> flatten
> > >> should be able to produce individual records suitable for further
> > >> analysis,
> > >> these records will be be a more reasonable size and get you good
> > >> performance for further analysis.
> > >>
> > >> -Jason
> > >>
> > >> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse <
> > altekrusejason@gmail.com
> > >> >
> > >> wrote:
> > >>
> > >> > Hi Muthu,
> > >> >
> > >> > Welcome to the Drill community!
> > >> >
> > >> > Unfortunately the mailing list does not allow attachments, please
> send
> > >> > along the error log copied into a mail message.
> > >> >
> > >> > If you are working with the 0.7 version of Drill, I would recommend
> > >> > upgrading the the new 0.8 release that just came out, there were a
> lot
> > >> of
> > >> > bug fixes and enhancements in the release.
> > >> >
> > >> > We're glad to hear you have been successful with your previous
> efforts
> > >> > with Drill. Unfortunately Drill is not well suited fro exploring
> > >> datasets
> > >> > like the one you have linked to. By default Drill supports records
> of
> > >> the
> > >> > format accepted by Mongo DB for bulk import, where individual
> records
> > >> take
> > >> > the form of a JSON object.
> > >> >
> > >> > Looking at this dataset, it follows a pattern we have seen before,
> but
> > >> > currently are not well suited for working with in Drill. All of the
> > >> data is
> > >> > in a single JSON object, at the top of the object are a number of
> > >> > dataset-wide metadata fields. These are all nested under a field
> > "view",
> > >> > with the main data I am guessing you want to analyze nested under
> the
> > >> field
> > >> > "data" in an array. While this format is not ideal for Drill, with
> the
> > >> size
> > >> > of the dataset you might be able to get it working with an operator
> in
> > >> > Drill that could help make the data more accessible.
> > >> >
> > >> > The operator is called flatten, and is designed to take an array and
> > >> > produce individual records for each element in the array. Optionally
> > >> other
> > >> > fields from the record can be included alongside each of the newly
> > >> spawned
> > >> > records to maintain a relationship between the incoming fields in
> the
> > >> > output of flatten.
> > >> >
> > >> > For more info on flatten, see this page in the wiki:
> > >> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function
> > >> >
> > >> > For this dataset, you might be able to get access to the data simply
> > by
> > >> > running the following:
> > >> >
> > >> > select flatten(data) from dfs.`/path/to/file.json`;
> > >> >
> > >> > If you need to have access to some of the other fields from the top
> of
> > >> the
> > >> > dataset, you can include them alongside flatten and they will be
> > copied
> > >> > into each record produced by the flatten operation:
> > >> >
> > >> > select flatten(data), view.id, view.category from
> > >> > dfs.`/path/to/file.json`;
> > >> >
> > >> >
> > >> >
> > >> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1086@gmail.com>
> > >> wrote:
> > >> >
> > >> >> Hi All
> > >> >>
> > >> >>
> > >> >>           Am new to the JSON format and exploring the same. I
had
> > used
> > >> >> Drill to analyse simple JSON files which work like a charm, but
am
> > not
> > >> able
> > >> >> to load the this "
> > >> >>
> > >>
> >
> https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD
> > >> "
> > >> >>  JSON file for analysis.
> > >> >>
> > >> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find
> the
> > >> >> attachment for the error.
> > >> >>
> > >> >>
> > >> >>
> > >> >> *RegardsMuthupandi.K*
> > >> >>
> > >> >>  Think before you print.
> > >> >>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message