drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafael Jaimes III <rafjai...@gmail.com>
Subject Re: Planning times
Date Thu, 04 Jun 2020 18:54:49 GMT
Hi Avner,

One way you might be able to optimize this is by modifying the size
and number of the parquet files. How many files do you have and how
big are they? Do you know what the row group size is? What is the HDFS
block size is on your storage?

There's probably a lot more intricate ways to improve performance with
the Drill settings, but I have not modified them.

- Rafael

On Thu, Jun 4, 2020 at 2:43 PM Avner Levy <avner.levy@gmail.com> wrote:
> I'm running Apache Drill (1.18 master branch) in a docker with data stored
> in Parquet files on S3.
> When I run queries, even the most simple ones such as:
> select name from `parquet/data/data.parquet` limit 1
> The "Planning" time is 0.7-1.5 sec while the "Execution" is only 0.112 sec.
> These proportions are maintained even if I run the same query multiple
> times in a row.
> Since I'm trying to minimize query times to a minimum, I was wondering if
> such planning times (compared to execution) make sense and is there any way
> to reduce it? (some plan caching mechanism)
> Thanks,
>   Avner

View raw message