drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abdel Hakim Deneche <adene...@maprtech.com>
Subject Re: Query Planning and Directory Pruning
Date Tue, 09 Feb 2016 15:25:24 GMT
Hi John,

Sorry I didn't get back to you (I thought I did).

No, I don't need the plan, I just wanted to confirm what was taking most of
the time and you already confirmed it's the planning.

Can you open a JIRA for this ? this may be a known issue, but I'm not sure.

Thanks

On Tue, Feb 9, 2016 at 6:08 AM, John Omernik <john@omernik.com> wrote:

> Abdel, do you still need the plans, as I said, if your table has any decent
> amount of directories and files, it looks like the planning is touching all
> the directories even though you are pruning.  I can post plans, however, I
> think in this case you'll find they are exactly the same, and the only
> difference is that the longer queries is planning much more because it has
> more files to read.
>
>
> On Thu, Feb 4, 2016 at 10:46 AM, John Omernik <john@omernik.com> wrote:
>
> > I can package up both plans for you if you need them (let me know if you
> > still want them) but I can tell you the plans were EXACTLY the same,
> > however the data-sum table took 0.932 seconds to plan the query, and the
> > data table (the one with the all the extra data) took 11.379 seconds to
> > plan the query. Indicating to me the issue isn't in the plan that was
> > created, but the actual planning process. (Let me know if you disagree or
> > still need to see the plan, like I said, the actual plans were exactly
> the
> > same)
> >
> >
> > John.
> >
> >
> > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche <
> > adeneche@maprtech.com> wrote:
> >
> >> Hey John, can you try an explain plan for both queries and see how much
> >> times it takes ?
> >>
> >> for example, for the first query you would run:
> >>
> >> *explain plan for* select count(1) from `data/2016-02-03`;
> >>
> >> It can also be helpful if you could share the query profiles for both
> >> queries.
> >>
> >> Thanks
> >>
> >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <john@omernik.com> wrote:
> >>
> >> > Hey all, I think am I seeing an issue related to
> >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to
> >> describe it
> >> > out here, see if it's really the case, and then determine what the
> >> blockers
> >> > may be to resolution.
> >> >
> >> > I am using the MapR Developer Release 1.4, and I have a directory with
> >> > subdirectories by data.
> >> >
> >> > data/2015-01-01
> >> > data/2015-01-02
> >> > data/2015-01-03
> >> >
> >> > These are stored as Parquet files.  At this point Each data averages
> >> about
> >> > 1 GB of data, and has roughly 75 parquet files in it.
> >> >
> >> > When I run
> >> >
> >> > select count(1) from `data/2016-02-03` it takes roughly 11 seconds.
> >> >
> >> > If I copy the 2016-02-03 directory to a new base (date-sum) and run
> >> >
> >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874 seconds.
> >> >
> >> > Same data, same structure, only difference is the data_sum directory
> >> only
> >> > has a few directories, iand data has dates going back to Nov 2015.  It
> >> > seems like it is getting files name for all files in each directory
> >> prior
> >> > to pruning which seems to me to be adding a lot of latency to queries
> >> that
> >> > doesn't need to be there.  (thus I think I am seeing 3759) but I
> wanted
> >> to
> >> > confirm, and then I wanted to see how we can address this in that the
> >> > directory prune should be fast, and on large data sets its just going
> to
> >> > get worse and worse.
> >> >
> >> >
> >> >
> >> > John
> >> >
> >>
> >>
> >>
> >> --
> >>
> >> Abdelhakim Deneche
> >>
> >> Software Engineer
> >>
> >>   <http://www.mapr.com/>
> >>
> >>
> >> Now Available - Free Hadoop On-Demand Training
> >> <
> >>
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >> >
> >>
> >
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message