drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: How do I make json files les painful
Date Thu, 19 Mar 2015 23:16:37 GMT
Kernel panic?  Can you try to share the information that causes this?  Are
you running out of memory?  What type of system are you running on?

On Thu, Mar 19, 2015 at 2:00 PM, Jim Bates <jbates@maprtech.com> wrote:

> On first look I could read all the files but  doing a flatten caused all
> kinds of things that were bad. The worst was a repeatable kernel panic.
>
> I think I'm back to making the initial smaller in the larger file sets.
>
> I have some files that are say 100M in size. Each file is a single line
> array:
> {"MyArrayInTheFile":[{"a":"1","b":"2"},{"a":"1","b":"2"},...]}
>  What is the best way to represent that so it can be explored? Do I do what
> was suggested before and put each array entry on its own line?
> {"MyArrayInTheFile":[
> {"a":"1","b":"2"},
> {"a":"1","b":"2"},
> ...
> ]}
>
> What works best for the 0.8 code?
>
>
> On Thu, Mar 19, 2015 at 12:59 PM, Jim Bates <jbates@maprtech.com> wrote:
>
> > Ok, went to drill-0.8.0.31020-1 and it was %1000 better.
> >
> > On Thu, Mar 19, 2015 at 12:16 PM, Sudhakar Thota <sthota@maprtech.com>
> > wrote:
> >
> >> I got the same issue, engineering recommended me use drill-0.8.0
> >>
> >> Sudhakar Thota
> >> Sent from my iPhone
> >>
> >> > On Mar 19, 2015, at 9:22 AM, Jim Bates <jbates@maprtech.com> wrote:
> >> >
> >> > I constantly, constantly, constantly hit this.
> >> >
> >> > I have json files that are just a huge collection of an array of json
> >> > objects
> >> >
> >> > example
> >> > "MyArrayInTheFile":
> >> > [{"a":"1","b":"2","c":"3"},{"a":"1","b":"2","c":"3"},...]
> >> >
> >> > My issue is in exploring the data, I hit this.
> >> >
> >> > Query failed: Query stopped., Record was too large to copy into
> vector.
> >> [
> >> > 39186288-2e01-408c-b886-dcee0a2c25c5 on maprdemo:31010 ]
> >> >
> >> > I can explore csv, tab, maprdb, hive at fairly large data sets and
> limit
> >> > the response to what fits in my system limitations but not json in
> this
> >> > format.
> >> >
> >> > The two options I have come up with to move forward are..
> >> >
> >> >   1. I strip out 90% of the array values in a file and explore that to
> >> get
> >> >   to my view. then go to a larger system and see if I have enough to
> >> get the
> >> >   job done.
> >> >   2. Move to the larger system and explore there taking resources that
> >> >   don't need to be spent on a science project.
> >> >
> >> > Hoping the smart people have a different option for me,
> >> >
> >> > Jim
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message