drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Muthu Pandi <muthu1...@gmail.com>
Subject Re: Nested or Array JSON
Date Fri, 03 Apr 2015 12:41:08 GMT
Tried with the Flatten but the result is same , Kindly help with pointers

"ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT 100
[30024]Query execution error. Details:[
Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]

] "



*RegardsMuthupandi.K*

 Think before you print.



On Fri, Apr 3, 2015 at 10:12 AM, Muthu Pandi <muthu1086@gmail.com> wrote:

> Thankyou Jason for ur detailed answer.
>
> Will try to use the Flatten on data column and let u know the status.
>
> Error message got from ODBC is
>
> "ERROR [HY000] [MapR][Drill] (1040) Drill failed to execute the query:
> SELECT * FROM `HDFS`.`root`.`./user/hadoop2/unclaimedaccount.json` LIMIT 100
> [30024]Query execution error. Details:[
> Query stopped., Needed to be in state INIT or IN_VARCHAR but in mode
> IN_BIGINT [ 7185da78-7759-4a8d-aebb-005f067a12e7 on nn01:31010 ]
>
> ] "
>
> Is there any way to normalise or convert this nested data to simpler JSON
> so that i can play with DRILL?
>
>
>
> *RegardsMuthupandi.K*
>
>  Think before you print.
>
>
>
> On Thu, Apr 2, 2015 at 9:23 PM, Jason Altekruse <altekrusejason@gmail.com>
> wrote:
>
>> To answer Andries' question, with an enhancement in the 0.8 release, there
>> should be no hard limit on the size of Drill records supported. That being
>> said, Drill is not fundamentally set up for processing enormous rows, so
>> we
>> do not have a clear idea of the performance impact of working with such
>> datasets.
>>
>> This document is going to be read as a single record originally, and I
>> think the 0.8 release should be able to read it in. From there, flatten
>> should be able to produce individual records suitable for further
>> analysis,
>> these records will be be a more reasonable size and get you good
>> performance for further analysis.
>>
>> -Jason
>>
>> On Thu, Apr 2, 2015 at 8:49 AM, Jason Altekruse <altekrusejason@gmail.com
>> >
>> wrote:
>>
>> > Hi Muthu,
>> >
>> > Welcome to the Drill community!
>> >
>> > Unfortunately the mailing list does not allow attachments, please send
>> > along the error log copied into a mail message.
>> >
>> > If you are working with the 0.7 version of Drill, I would recommend
>> > upgrading the the new 0.8 release that just came out, there were a lot
>> of
>> > bug fixes and enhancements in the release.
>> >
>> > We're glad to hear you have been successful with your previous efforts
>> > with Drill. Unfortunately Drill is not well suited fro exploring
>> datasets
>> > like the one you have linked to. By default Drill supports records of
>> the
>> > format accepted by Mongo DB for bulk import, where individual records
>> take
>> > the form of a JSON object.
>> >
>> > Looking at this dataset, it follows a pattern we have seen before, but
>> > currently are not well suited for working with in Drill. All of the
>> data is
>> > in a single JSON object, at the top of the object are a number of
>> > dataset-wide metadata fields. These are all nested under a field "view",
>> > with the main data I am guessing you want to analyze nested under the
>> field
>> > "data" in an array. While this format is not ideal for Drill, with the
>> size
>> > of the dataset you might be able to get it working with an operator in
>> > Drill that could help make the data more accessible.
>> >
>> > The operator is called flatten, and is designed to take an array and
>> > produce individual records for each element in the array. Optionally
>> other
>> > fields from the record can be included alongside each of the newly
>> spawned
>> > records to maintain a relationship between the incoming fields in the
>> > output of flatten.
>> >
>> > For more info on flatten, see this page in the wiki:
>> > https://cwiki.apache.org/confluence/display/DRILL/FLATTEN+Function
>> >
>> > For this dataset, you might be able to get access to the data simply by
>> > running the following:
>> >
>> > select flatten(data) from dfs.`/path/to/file.json`;
>> >
>> > If you need to have access to some of the other fields from the top of
>> the
>> > dataset, you can include them alongside flatten and they will be copied
>> > into each record produced by the flatten operation:
>> >
>> > select flatten(data), view.id, view.category from
>> > dfs.`/path/to/file.json`;
>> >
>> >
>> >
>> > On Wed, Apr 1, 2015 at 10:52 PM, Muthu Pandi <muthu1086@gmail.com>
>> wrote:
>> >
>> >> Hi All
>> >>
>> >>
>> >>           Am new to the JSON format and exploring the same. I had used
>> >> Drill to analyse simple JSON files which work like a charm, but am not
>> able
>> >> to load the this "
>> >>
>> https://opendata.socrata.com/api/views/n2rk-fwkj/rows.json?accessType=DOWNLOAD
>> "
>> >>  JSON file for analysis.
>> >>
>> >> Am using ODBC connector to connect to the 0.8 Drill. Kindly find the
>> >> attachment for the error.
>> >>
>> >>
>> >>
>> >> *RegardsMuthupandi.K*
>> >>
>> >>  Think before you print.
>> >>
>> >>
>> >>
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message