drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Query Planning and Directory Pruning
Date Tue, 09 Feb 2016 15:32:41 GMT
This one seems to cover it:

https://issues.apache.org/jira/browse/DRILL-3759



On Tue, Feb 9, 2016 at 9:25 AM, Abdel Hakim Deneche <adeneche@maprtech.com>
wrote:

> Hi John,
>
> Sorry I didn't get back to you (I thought I did).
>
> No, I don't need the plan, I just wanted to confirm what was taking most of
> the time and you already confirmed it's the planning.
>
> Can you open a JIRA for this ? this may be a known issue, but I'm not sure.
>
> Thanks
>
> On Tue, Feb 9, 2016 at 6:08 AM, John Omernik <john@omernik.com> wrote:
>
> > Abdel, do you still need the plans, as I said, if your table has any
> decent
> > amount of directories and files, it looks like the planning is touching
> all
> > the directories even though you are pruning.  I can post plans, however,
> I
> > think in this case you'll find they are exactly the same, and the only
> > difference is that the longer queries is planning much more because it
> has
> > more files to read.
> >
> >
> > On Thu, Feb 4, 2016 at 10:46 AM, John Omernik <john@omernik.com> wrote:
> >
> > > I can package up both plans for you if you need them (let me know if
> you
> > > still want them) but I can tell you the plans were EXACTLY the same,
> > > however the data-sum table took 0.932 seconds to plan the query, and
> the
> > > data table (the one with the all the extra data) took 11.379 seconds to
> > > plan the query. Indicating to me the issue isn't in the plan that was
> > > created, but the actual planning process. (Let me know if you disagree
> or
> > > still need to see the plan, like I said, the actual plans were exactly
> > the
> > > same)
> > >
> > >
> > > John.
> > >
> > >
> > > On Thu, Feb 4, 2016 at 10:31 AM, Abdel Hakim Deneche <
> > > adeneche@maprtech.com> wrote:
> > >
> > >> Hey John, can you try an explain plan for both queries and see how
> much
> > >> times it takes ?
> > >>
> > >> for example, for the first query you would run:
> > >>
> > >> *explain plan for* select count(1) from `data/2016-02-03`;
> > >>
> > >> It can also be helpful if you could share the query profiles for both
> > >> queries.
> > >>
> > >> Thanks
> > >>
> > >> On Thu, Feb 4, 2016 at 8:15 AM, John Omernik <john@omernik.com>
> wrote:
> > >>
> > >> > Hey all, I think am I seeing an issue related to
> > >> > https://issues.apache.org/jira/browse/DRILL-3759 but I want to
> > >> describe it
> > >> > out here, see if it's really the case, and then determine what the
> > >> blockers
> > >> > may be to resolution.
> > >> >
> > >> > I am using the MapR Developer Release 1.4, and I have a directory
> with
> > >> > subdirectories by data.
> > >> >
> > >> > data/2015-01-01
> > >> > data/2015-01-02
> > >> > data/2015-01-03
> > >> >
> > >> > These are stored as Parquet files.  At this point Each data averages
> > >> about
> > >> > 1 GB of data, and has roughly 75 parquet files in it.
> > >> >
> > >> > When I run
> > >> >
> > >> > select count(1) from `data/2016-02-03` it takes roughly 11 seconds.
> > >> >
> > >> > If I copy the 2016-02-03 directory to a new base (date-sum) and run
> > >> >
> > >> > select count(1) from `data_sum/2016-02-03` it runs in 0.874 seconds.
> > >> >
> > >> > Same data, same structure, only difference is the data_sum directory
> > >> only
> > >> > has a few directories, iand data has dates going back to Nov 2015.
> It
> > >> > seems like it is getting files name for all files in each directory
> > >> prior
> > >> > to pruning which seems to me to be adding a lot of latency to
> queries
> > >> that
> > >> > doesn't need to be there.  (thus I think I am seeing 3759) but I
> > wanted
> > >> to
> > >> > confirm, and then I wanted to see how we can address this in that
> the
> > >> > directory prune should be fast, and on large data sets its just
> going
> > to
> > >> > get worse and worse.
> > >> >
> > >> >
> > >> >
> > >> > John
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >>
> > >> Abdelhakim Deneche
> > >>
> > >> Software Engineer
> > >>
> > >>   <http://www.mapr.com/>
> > >>
> > >>
> > >> Now Available - Free Hadoop On-Demand Training
> > >> <
> > >>
> >
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> > >> >
> > >>
> > >
> > >
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message