drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avner Levy <avner.l...@gmail.com>
Subject Re: Planning times
Date Sat, 06 Jun 2020 19:51:15 GMT
Hi Charles,
I'm using master branch (1.18.0-SNAPSHOT) docker.
I've enabled the metastore, session wise and run the same query twice but
still got the following times.
Is there a way to pre-define the table's schema in a way that will reduce
the query time?
The query is:
*select name from `parquet/data.parquet` limit 1*

Any idea why planning takes so long on such trivial query?
Does it include accessing the file for schema discovery?
I'm providing the specific filename in the queries in order to reduce the
file listing part.
Thanks for your help,
  Avner







*DurationPlanning  Queued   Execution Total0.683 sec 0.000 sec 0.090 sec
0.773 secOptions Overview Session OptionsName Valuemetastore.enabled true*


On Thu, Jun 4, 2020 at 9:09 PM Charles Givre <cgivre@gmail.com> wrote:

> Hi Avner,
> Maybe you said this already but what version of Drill are you using and do
> you have the metastore enabled?
> --C
>
>
>
> > On Jun 4, 2020, at 9:02 PM, Avner Levy <avner.levy@gmail.com> wrote:
> >
> > Thanks Rafael for your answer.
> > As I wrote in the previous email these planning times occur even when
> > selecting one fields from one tiny file (60k) that I pass directly by
> full
> > path (select name from `parquet/data/data.parquet` limit 1).
> > Any idea what can influence the time in such a trivial scenario?
> > In addition, doesn't Drill cache execution plans between similar queries
> > executions?
> > Best regards,
> > Avner
> >
> >
> > On Thu, Jun 4, 2020 at 2:55 PM Rafael Jaimes III <rafjaimes@gmail.com>
> > wrote:
> >
> >> Hi Avner,
> >>
> >> One way you might be able to optimize this is by modifying the size
> >> and number of the parquet files. How many files do you have and how
> >> big are they? Do you know what the row group size is? What is the HDFS
> >> block size is on your storage?
> >>
> >> There's probably a lot more intricate ways to improve performance with
> >> the Drill settings, but I have not modified them.
> >>
> >> - Rafael
> >>
> >> On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <avner.levy@gmail.com> wrote:
> >>>
> >>> I'm running Apache Drill (1.18 master branch) in a docker with data
> >> stored
> >>> in Parquet files on S3.
> >>> When I run queries, even the most simple ones such as:
> >>>
> >>> select name from `parquet/data/data.parquet` limit 1
> >>>
> >>> The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112
> >> sec.
> >>> These proportions are maintained even if I run the same query multiple
> >>> times in a row.
> >>> Since I'm trying to minimize query times to a minimum, I was wondering
> if
> >>> such planning times (compared to execution) make sense and is there any
> >> way
> >>> to reduce it? (some plan caching mechanism)
> >>> Thanks,
> >>>  Avner
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message