drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Query Planning and Directory Pruning
Date Thu, 04 Feb 2016 16:31:02 GMT
Hey John, can you try an explain plan for both queries and see how much
times it takes ?

for example, for the first query you would run:

*explain plan for* select count(1) from `data/2016-02-03`;

It can also be helpful if you could share the query profiles for both
queries.

Thanks

On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <john@omernik.com> wrote:

> Hey all, I think am I seeing an issue related to
> https://issues.apache.org/jira/browse/DRILL-3759 but I want to describe it
> out here, see if it's really the case, and then determine what the blockers
> may be to resolution.
>
> I am using the MapR Developer Release 1.4, and I have a directory with
> subdirectories by data.
>
> data/2015-01-01
> data/2015-01-02
> data/2015-01-03
>
> These are stored as Parquet files.  At this point Each data averages about
> 1 GB of data, and has roughly 75 parquet files in it.
>
> When I run
>
> select count(1) from `data/2016-02-03` it takes roughly 11 seconds.
>
> If I copy the 2016-02-03 directory to a new base (date-sum) and run
>
> select count(1) from `data_sum/2016-02-03` it runs in 0.874 seconds.
>
> Same data, same structure, only difference is the data_sum directory only
> has a few directories, iand data has dates going back to Nov 2015.  It
> seems like it is getting files name for all files in each directory prior
> to pruning which seems to me to be adding a lot of latency to queries that
> doesn't need to be there.  (thus I think I am seeing 3759) but I wanted to
> confirm, and then I wanted to see how we can address this in that the
> directory prune should be fast, and on large data sets its just going to
> get worse and worse.
>
>
>
> John
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message