drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Gilmore <dragoncu...@gmail.com>
Subject Re: Query planning cost
Date Thu, 07 May 2015 06:13:29 GMT
Just a follow up - I have isolated that it is almost linear according to
the number of Parquet files.  The footer read is quite expensive and not
parallelised at all (it uses it for query planning).

Is there any way to control the row group size when creating Parquet
files?  I could create fewer, larger files, but still want the benefit of
smaller row groups (as I have just done the Parquet pushdown filtering).

On Thu, May 7, 2015 at 4:08 PM, Adam Gilmore <dragoncurve@gmail.com> wrote:

> Hi guys,
>
> I've been looking at the speed of some of our queries and have noticed
> there is quite a significant delay to the query actually starting.
>
> For example, querying about 70 Parquet files in a directory, it takes
> about 370ms before it starts the first fragment.
>
> Obviously, considering it's not in the plan, it's very hard to see where
> exactly it's spending that 370ms without instrumenting/debugging.
>
> How can I troubleshoot where Drill is spending this 370ms?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message