drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <...@apache.org>
Subject Re: Performance degradation for UNION ALL parquet data sources
Date Thu, 08 Dec 2016 20:45:26 GMT
Can you please check the Explain plan output for the original query
and the query against view, and see if there is any difference in the
two query plans? The difference might be caused by UNION ALL operator,
which might lead to different parallelization mode.


On Thu, Dec 8, 2016 at 9:08 AM, Alexander Reshetov
<alexander.v.reshetov@gmail.com> wrote:
> Hi,
>
> I have one data file which I converted to parquet with CTAS.
>
> It took about 35 seconds to execute next query:
>
> select action['login'], count(*) from dfs.datastore.events group by
> action['login'];
>
> After splitting original source to 4 equal parts I created 4 view on
> this parts (events_0, events_1, events_2, events_3):
>
> create view dfs.datastore.events_combined as
> select t0.`timestamp` as event_time, t0.client_id, t0.action from
> dfs.datastore.events_0 t0
> union all
> select t1.`timestamp` as event_time, t1.client_id, t1.action from
> dfs.datastore.events_1 t1
> union all
> select t2.`timestamp` as event_time, t2.client_id, t2.action from
> dfs.datastore.events_2 t2
> union all
> select t3.`timestamp` as event_time, t3.client_id, t3.action from
> dfs.datastore.events_3 t3;
>
>
> When I make same query but on this view it executes much slower -
> about 500 seconds.
>
> select action['login'], count(*) from dfs.datastore.events_combined
> group by action['login'];
>
>
> I expected to see same execution time, but it degraded too much. What
> could cause it and/or could it be solved somehow?

Mime
View raw message