drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PRAVEEN DEVERACHETTY <pravee...@gmail.com>
Subject Re: Performace issue
Date Wed, 13 Feb 2019 06:13:45 GMT
Our json data has 5000 objects, each object has around 40 attributes. Our
data does not have any child rows, the reason we are using FLATTEN because
we are sending the data using rest api post method. Using CONVERT_FROM
function to format it into json in the memory(no storage plugin), as it is
an array of json objects, we are flattenning to multiple rows. On the
flattened data, we are running query as discribed above.

Our data is already flat, there are no nested data. Yes i am passing all
the configuration and monitoring using jmc profiling.

 I suspect CONVERT_FROM function is causing the spike? not sure.

Our main requirement is to use drill with out any storage plugin as it
leads to security issues having data in the disk. We are looking a solution
to run queries using in-memory solution, like generate data from db and
pass this data as json string to drill using rest-api, along with the ansi
query on the json string. Is there anyway i can write plugin for this
response and convert it to json? looks like convert_from function is not
working as planned for us.

On Tue, Feb 12, 2019 at 11:32 PM Kunal Khatua <kunal@apache.org> wrote:

> You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> I suspect that your 5MB JSON data might be having a lot of objects, which
> need to be serialized in memory.
>
> FLATTEN has the problem that it replicates the data parent data for each
> child node that is being flattened into a row... so the resulting data
> being constructed in memory can grow significantly.
> One way to work around (not elegant, but worth trying) would be to
> generate intermediate flatten data and write temporary (if not using WebUI)
> tables and keep flattening out those records until you have a fully
> flattened dataset to work with directly.
>
> On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY <praveendk@gmail.com>
> wrote:
> Thnks a lot Kunal. I am looking into that. I have one observation.
>
> With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
> of heap? how do i control heap? Are there any settings i can modify. i am
> reading a lot, but nothing is working for me. It would be helpful how to
> control heap, i modified memory parameters based on the documentation, it
> is not working yet. it would be really helpful if i get some help in this
> regard. Thanks in advance.
>
> Regards
> Praveen
>
> On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>
> > This is a good starting point for understanding LATERAL-UNNEST and how it
> > compares to the FLATTEN operator.
> >
> > https://drill.apache.org/docs/lateral-join/
> >
> >
> > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > Thanks Kunal.
> > i am not getting how to use lateral-unrest as dataset does not have child
> > rows. All data is in array of json objects(as mentioned below). There are
> > two json objects separated by comma and enclosed in squre bracket.
> >
> >
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> >
> > We are using drill from Java. Through a rest invocation. Not using json
> > files. All data is sent over post as string. We are using convert_from
> > function in the query to convert into json objects. As we are sending
> array
> > of json objects, using FLATTEN operator to convert into multiple rows. is
> > there any way to avoid Flatten, as we see huge spike for 54MB data, going
> > to 24GB and still failing with heap error. not sure what is wrong. Can i
> > use FLATTEN on the entire data set? There are almost 54K records that is
> > getting FLATTENED.
> >
> > example query: 1)first converted into array of json objects 2) flatten to
> > convert into multiple rows
> > select ems.* from (select flatten(t.jdata) as record from (select
> >
> >
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> > as jdata) as t) ems
> >
> >
> > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
> >
> > > The memory (heap) would climb as it tries to flatten the JSON data.
> Have
> > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant
> to
> > > address memory issues for some use cases of the FLATTEN operator.
> > >
> > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > > I am running a query with UNION ALL. as below
> > >
> > > select
> > > from ( select FLATTEN(t.jdata) as record from
> > > ((select convert_from(json string, json) union all
> > > (select conver_from(json_string,json) union all
> > > ...
> > > ) as jdata) ) as t) ems
> > >
> > > Reason for giving union all is because we are invoking a call using
> rest
> > > app, there is limitation of 20,000 when we use convert_from function.
> Our
> > > heap size is 8GB, server is 8core. From profiling, it shows this
> > perticula
> > > query spikes from 100MB to 8GB continuously. is there anything i am
> > > doing wrong?.
> > >
> > > Thanks,
> > > Prveen
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message