drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sorabh Hamirwasia <sohami.apa...@gmail.com>
Subject Re: Performace issue
Date Tue, 12 Feb 2019 18:19:35 GMT
Hi Praveen,
Can you also share what is the schema of your entire dataset and in what
format it's stored?

Thanks,
Sorabh

On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua <kunal@apache.org> wrote:

> You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> I suspect that your 5MB JSON data might be having a lot of objects, which
> need to be serialized in memory.
>
> FLATTEN has the problem that it replicates the data parent data for each
> child node that is being flattened into a row... so the resulting data
> being constructed in memory can grow significantly.
> One way to work around (not elegant, but worth trying) would be to
> generate intermediate flatten data and write temporary (if not using WebUI)
> tables and keep flattening out those records until you have a fully
> flattened dataset to work with directly.
>
> On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY <praveendk@gmail.com>
> wrote:
> Thnks a lot Kunal. I am looking into that. I have one observation.
>
> With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
> of heap? how do i control heap? Are there any settings i can modify. i am
> reading a lot, but nothing is working for me. It would be helpful how to
> control heap, i modified memory parameters based on the documentation, it
> is not working yet. it would be really helpful if i get some help in this
> regard. Thanks in advance.
>
> Regards
> Praveen
>
> On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>
> > This is a good starting point for understanding LATERAL-UNNEST and how it
> > compares to the FLATTEN operator.
> >
> > https://drill.apache.org/docs/lateral-join/
> >
> >
> > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > Thanks Kunal.
> > i am not getting how to use lateral-unrest as dataset does not have child
> > rows. All data is in array of json objects(as mentioned below). There are
> > two json objects separated by comma and enclosed in squre bracket.
> >
> >
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> >
> > We are using drill from Java. Through a rest invocation. Not using json
> > files. All data is sent over post as string. We are using convert_from
> > function in the query to convert into json objects. As we are sending
> array
> > of json objects, using FLATTEN operator to convert into multiple rows. is
> > there any way to avoid Flatten, as we see huge spike for 54MB data, going
> > to 24GB and still failing with heap error. not sure what is wrong. Can i
> > use FLATTEN on the entire data set? There are almost 54K records that is
> > getting FLATTENED.
> >
> > example query: 1)first converted into array of json objects 2) flatten to
> > convert into multiple rows
> > select ems.* from (select flatten(t.jdata) as record from (select
> >
> >
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> > as jdata) as t) ems
> >
> >
> > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
> >
> > > The memory (heap) would climb as it tries to flatten the JSON data.
> Have
> > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant
> to
> > > address memory issues for some use cases of the FLATTEN operator.
> > >
> > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > > I am running a query with UNION ALL. as below
> > >
> > > select
> > > from ( select FLATTEN(t.jdata) as record from
> > > ((select convert_from(json string, json) union all
> > > (select conver_from(json_string,json) union all
> > > ...
> > > ) as jdata) ) as t) ems
> > >
> > > Reason for giving union all is because we are invoking a call using
> rest
> > > app, there is limitation of 20,000 when we use convert_from function.
> Our
> > > heap size is 8GB, server is 8core. From profiling, it shows this
> > perticula
> > > query spikes from 100MB to 8GB continuously. is there anything i am
> > > doing wrong?.
> > >
> > > Thanks,
> > > Prveen
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message