drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PRAVEEN DEVERACHETTY <pravee...@gmail.com>
Subject Re: Performace issue
Date Tue, 12 Feb 2019 06:37:37 GMT
Thnks a lot Kunal. I am looking into that. I have one observation.

With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
of heap? how do i control heap? Are there any settings i can modify. i am
reading a lot, but nothing is working for me. It would be helpful how to
control heap, i modified memory parameters based on the documentation, it
is not working yet. it would be really helpful if i get some help in this
regard. Thanks in advance.

Regards
Praveen

On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua <kunal@apache.org> wrote:

> This is a good starting point for understanding LATERAL-UNNEST and how it
> compares to the FLATTEN operator.
>
> https://drill.apache.org/docs/lateral-join/
>
>
> On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY <praveendk@gmail.com> wrote:
> Thanks Kunal.
> i am not getting how to use lateral-unrest as dataset does not have child
> rows. All data is in array of json objects(as mentioned below). There are
> two json objects separated by comma and enclosed in squre bracket.
>
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
>
> We are using drill from Java. Through a rest invocation. Not using json
> files. All data is sent over post as string. We are using convert_from
> function in the query to convert into json objects. As we are sending array
> of json objects, using FLATTEN operator to convert into multiple rows. is
> there any way to avoid Flatten, as we see huge spike for 54MB data, going
> to 24GB and still failing with heap error. not sure what is wrong. Can i
> use FLATTEN on the entire data set? There are almost 54K records that is
> getting FLATTENED.
>
> example query: 1)first converted into array of json objects 2) flatten to
> convert into multiple rows
> select ems.* from (select flatten(t.jdata) as record from (select
>
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> as jdata) as t) ems
>
>
> On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
>
> > The memory (heap) would climb as it tries to flatten the JSON data. Have
> > you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> > address memory issues for some use cases of the FLATTEN operator.
> >
> > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > I am running a query with UNION ALL. as below
> >
> > select
> > from ( select FLATTEN(t.jdata) as record from
> > ((select convert_from(json string, json) union all
> > (select conver_from(json_string,json) union all
> > ...
> > ) as jdata) ) as t) ems
> >
> > Reason for giving union all is because we are invoking a call using rest
> > app, there is limitation of 20,000 when we use convert_from function. Our
> > heap size is 8GB, server is 8core. From profiling, it shows this
> perticula
> > query spikes from 100MB to 8GB continuously. is there anything i am
> > doing wrong?.
> >
> > Thanks,
> > Prveen
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message