drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua" <ku...@apache.org>
Subject Re: Performace issue
Date Tue, 12 Feb 2019 05:48:19 GMT
This is a good starting point for understanding LATERAL-UNNEST and how it compares to the FLATTEN


On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY <praveendk@gmail.com> wrote:
Thanks Kunal.
i am not getting how to use lateral-unrest as dataset does not have child
rows. All data is in array of json objects(as mentioned below). There are
two json objects separated by comma and enclosed in squre bracket.

We are using drill from Java. Through a rest invocation. Not using json
files. All data is sent over post as string. We are using convert_from
function in the query to convert into json objects. As we are sending array
of json objects, using FLATTEN operator to convert into multiple rows. is
there any way to avoid Flatten, as we see huge spike for 54MB data, going
to 24GB and still failing with heap error. not sure what is wrong. Can i
use FLATTEN on the entire data set? There are almost 54K records that is
getting FLATTENED.

example query: 1)first converted into array of json objects 2) flatten to
convert into multiple rows
select ems.* from (select flatten(t.jdata) as record from (select
as jdata) as t) ems

On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:

> The memory (heap) would climb as it tries to flatten the JSON data. Have
> you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> address memory issues for some use cases of the FLATTEN operator.
> On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> I am running a query with UNION ALL. as below
> select
> from ( select FLATTEN(t.jdata) as record from
> ((select convert_from(json string, json) union all
> (select conver_from(json_string,json) union all
> ...
> ) as jdata) ) as t) ems
> Reason for giving union all is because we are invoking a call using rest
> app, there is limitation of 20,000 when we use convert_from function. Our
> heap size is 8GB, server is 8core. From profiling, it shows this perticula
> query spikes from 100MB to 8GB continuously. is there anything i am
> doing wrong?.
> Thanks,
> Prveen

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message