drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PRAVEEN DEVERACHETTY <pravee...@gmail.com>
Subject Re: Performace issue
Date Thu, 14 Feb 2019 17:54:10 GMT
On Thu, Feb 14, 2019 at 7:10 AM PRAVEEN DEVERACHETTY <praveendk@gmail.com>
wrote:

> HI Sorabh,
>
> Here is the sample query passed using REST API. This query is sent as a
> body and submit job(REST).  convert_from function  convert json string to
> json string object. Then flatten on the resultset defined in you query.
> Please let me know if anythng else is required. Our main gao
>
> SELECT
>   'EMPSLS' as recordType,
>   ems.record.revenueCenterName as rvcName,
>   ems.record.revenueCenterNum as rvcNum,
>   ems.record.revenueCenterMasterName as rvcMasterName
> FROM (
>
> select FLATTEN(t.jdata) as record from
> (
> (select
> convert_from('[{"revenueCenterName":"Restaurant","revenueCenterNum":1,"revenueCenterMasterName":null,"revenueCenterMasterNum":null},
> {"revenueCenterName":"Restaurant","revenueCenterNum":2,"revenueCenterMasterName":null,"revenueCenterMasterNum":null}]',
> 'json') as jdata)
> ) as t) ems
>
> On Wed, Feb 13, 2019 at 11:18 PM Sorabh Hamirwasia <shamirwasia@mapr.com>
> wrote:
>
>> Hi Praveen,
>> I am probably missing something here because I don't understand how are
>> you
>> feeding data to Drill in memory using the rest api. As you mentioned data
>> has to be stored on disk or some db for Drill to fetch it. Can you please
>> share the query profile for your query ?
>>
>> P.S. Attachments are not allowed through this mailing list. Would be great
>> if you can upload the profile somewhere (dropbox, gDrive, etc) and share
>> the link with us.
>>
>> Thanks,
>> Sorabh
>>
>> On Wed, Feb 13, 2019 at 7:58 AM PRAVEEN DEVERACHETTY <praveendk@gmail.com
>> >
>> wrote:
>>
>> > As per my understanding with Apache drill, it is based on the file store
>> > only. Please help me if i can create any plugins for the following use
>> case
>> > 1. Create a json object and push to Apache drill in memory(cache). I can
>> > create json object in java, and if any api available from drill to push
>> > this file in the memory?
>> > 2. Read the file from the the memory(cahce) and execute queries by using
>> > that json object from the memory instead of the from the file system or
>> any
>> > data store.
>> >
>> > is it possible to do with apache drill? It would be really helpful to
>> > understand the changes i need to make.
>> >
>> > thanks,
>> > Praveen
>> >
>> > On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY <
>> praveendk@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Sorabh, Data is in json string format, sent over rest api. Using
>> > > convert_from function to convert json string to json array and flatten
>> > the
>> > > result array into multiple rows. Data is not stored in the disk. All
>> data
>> > > is in the memory.
>> > >
>> > > Thanks,
>> > > Praveen
>> > >
>> > > On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia <
>> > > sohami.apache@gmail.com> wrote:
>> > >
>> > >> Hi Praveen,
>> > >> Can you also share what is the schema of your entire dataset and in
>> what
>> > >> format it's stored?
>> > >>
>> > >> Thanks,
>> > >> Sorabh
>> > >>
>> > >> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua <kunal@apache.org>
>> wrote:
>> > >>
>> > >> > You'll need to edit the memory settings in
>> > DRILL_HOME/conf/drill-env.sh
>> > >> > I suspect that your 5MB JSON data might be having a lot of objects,
>> > >> which
>> > >> > need to be serialized in memory.
>> > >> >
>> > >> > FLATTEN has the problem that it replicates the data parent data
for
>> > each
>> > >> > child node that is being flattened into a row... so the resulting
>> data
>> > >> > being constructed in memory can grow significantly.
>> > >> > One way to work around (not elegant, but worth trying) would be
to
>> > >> > generate intermediate flatten data and write temporary (if not
>> using
>> > >> WebUI)
>> > >> > tables and keep flattening out those records until you have a
fully
>> > >> > flattened dataset to work with directly.
>> > >> >
>> > >> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY <
>> praveendk@gmail.com>
>> > >> > wrote:
>> > >> > Thnks a lot Kunal. I am looking into that. I have one observation.
>> > >> >
>> > >> > With out flatten also, i tried to run a query of size 5MB, it
is
>> > taking
>> > >> 5GB
>> > >> > of heap? how do i control heap? Are there any settings i can
>> modify. i
>> > >> am
>> > >> > reading a lot, but nothing is working for me. It would be helpful
>> how
>> > to
>> > >> > control heap, i modified memory parameters based on the
>> documentation,
>> > >> it
>> > >> > is not working yet. it would be really helpful if i get some help
>> in
>> > >> this
>> > >> > regard. Thanks in advance.
>> > >> >
>> > >> > Regards
>> > >> > Praveen
>> > >> >
>> > >> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>> > >> >
>> > >> > > This is a good starting point for understanding LATERAL-UNNEST
>> and
>> > >> how it
>> > >> > > compares to the FLATTEN operator.
>> > >> > >
>> > >> > > https://drill.apache.org/docs/lateral-join/
>> > >> > >
>> > >> > >
>> > >> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
>> > >> > > Thanks Kunal.
>> > >> > > i am not getting how to use lateral-unrest as dataset does
not
>> have
>> > >> child
>> > >> > > rows. All data is in array of json objects(as mentioned below).
>> > There
>> > >> are
>> > >> > > two json objects separated by comma and enclosed in squre
>> bracket.
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
>> > >> > >
>> > >> > > We are using drill from Java. Through a rest invocation.
Not
>> using
>> > >> json
>> > >> > > files. All data is sent over post as string. We are using
>> > convert_from
>> > >> > > function in the query to convert into json objects. As we
are
>> > sending
>> > >> > array
>> > >> > > of json objects, using FLATTEN operator to convert into multiple
>> > >> rows. is
>> > >> > > there any way to avoid Flatten, as we see huge spike for
54MB
>> data,
>> > >> going
>> > >> > > to 24GB and still failing with heap error. not sure what
is
>> wrong.
>> > >> Can i
>> > >> > > use FLATTEN on the entire data set? There are almost 54K
records
>> > that
>> > >> is
>> > >> > > getting FLATTENED.
>> > >> > >
>> > >> > > example query: 1)first converted into array of json objects
2)
>> > >> flatten to
>> > >> > > convert into multiple rows
>> > >> > > select ems.* from (select flatten(t.jdata) as record from
(select
>> > >> > >
>> > >> > >
>> > >> >
>> > >>
>> >
>> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
>> > >> > > as jdata) as t) ems
>> > >> > >
>> > >> > >
>> > >> > > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
>> > >> > >
>> > >> > > > The memory (heap) would climb as it tries to flatten
the JSON
>> > data.
>> > >> > Have
>> > >> > > > you tried looking at Drill's LateralJoin-Unnest feature?
It was
>> > >> meant
>> > >> > to
>> > >> > > > address memory issues for some use cases of the FLATTEN
>> operator.
>> > >> > > >
>> > >> > > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
>> > >> > > > I am running a query with UNION ALL. as below
>> > >> > > >
>> > >> > > > select
>> > >> > > > from ( select FLATTEN(t.jdata) as record from
>> > >> > > > ((select convert_from(json string, json) union all
>> > >> > > > (select conver_from(json_string,json) union all
>> > >> > > > ...
>> > >> > > > ) as jdata) ) as t) ems
>> > >> > > >
>> > >> > > > Reason for giving union all is because we are invoking
a call
>> > using
>> > >> > rest
>> > >> > > > app, there is limitation of 20,000 when we use convert_from
>> > >> function.
>> > >> > Our
>> > >> > > > heap size is 8GB, server is 8core. From profiling, it
shows
>> this
>> > >> > > perticula
>> > >> > > > query spikes from 100MB to 8GB continuously. is there
anything
>> i
>> > am
>> > >> > > > doing wrong?.
>> > >> > > >
>> > >> > > > Thanks,
>> > >> > > > Prveen
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message