drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Bates <jba...@maprtech.com>
Subject Re: How do I make json files les painful
Date Thu, 19 Mar 2015 22:33:11 GMT
Ok, large json arrays kept crashing the vm OS so I just flattened the
larger files with jq

example in keeping with what I already mentioned:

jq '.[] | {MyArrayInTheFile: .[]}' MyFile.json > MyNewfile.json

result:
{ "MyArrayInTheFile": {"a":"1","b":"2"} }
{ "MyArrayInTheFile": {"a":"1","b":"2"} }



On Thu, Mar 19, 2015 at 4:00 PM, Jim Bates <jbates@maprtech.com> wrote:

> On first look I could read all the files but  doing a flatten caused all
> kinds of things that were bad. The worst was a repeatable kernel panic.
>
> I think I'm back to making the initial smaller in the larger file sets.
>
> I have some files that are say 100M in size. Each file is a single line
> array:
> {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
>  What is the best way to represent that so it can be explored? Do I do
> what was suggested before and put each array entry on its own line?
> {"MyArrayInTheFile":[
> {"a":"1","b":"2"},
> {"a":"1","b":"2"},
> ...
> ]}
>
> What works best for the 0.8 code?
>
>
> On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates <jbates@maprtech.com> wrote:
>
>> Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
>>
>> On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota <sthota@maprtech.com>
>> wrote:
>>
>>> I got the same issue, engineering recommended me use drill-0.8.0
>>>
>>> Sudhakar Thota
>>> Sent from my iPhone
>>>
>>> > On Mar 19, 2015, at 9:22 AM, Jim Bates <jbates@maprtech.com> wrote:
>>> >
>>> > I constantly, constantly, constantly hit this.
>>> >
>>> > I have json files that are just a huge collection of an array of json
>>> > objects
>>> >
>>> > example
>>> > "MyArrayInTheFile":
>>> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
>>> >
>>> > My issue is in exploring the data, I hit this.
>>> >
>>> > Query failed: Query stopped., Record was too large to copy into
>>> vector. [
>>> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
>>> >
>>> > I can explore csv, tab, maprdb, hive at fairly large data sets and
>>> limit
>>> > the response to what fits in my system limitations but not json in this
>>> > format.
>>> >
>>> > The two options I have come up with to move forward are..
>>> >
>>> >   1. I strip out 90% of the array values in a file and explore that to
>>> get
>>> >   to my view. then go to a larger system and see if I have enough to
>>> get the
>>> >   job done.
>>> >   2. Move to the larger system and explore there taking resources that
>>> >   don't need to be spent on a science project.
>>> >
>>> > Hoping the smart people have a different option for me,
>>> >
>>> > Jim
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message