drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kunal Khatua" <ku...@apache.org>
Subject Re: Multiple fragments in apache drill
Date Wed, 13 Feb 2019 19:22:25 GMT
Hi Hugues

The number of fragments is determined by the number of sources (i.e. whether the data can
be read in parallel) and the number of estimated rows.
CSV and Parquet files are easy to read in parallel, but JSON files are not, because Drill
does not know how many JSON documents exist in the file and where their offsets are.

The number of estimated rows tells Drill whether to parallelize a major fragment of operators.
You can try reducing this property in your session/system via the UI [/options page] : 

~ Kunal

On 2/13/2019 7:14:34 AM, Kwizera hugues Teddy <nbted2017@gmail.com> wrote:
Hello Team drill,

I'm executing a query in Apache drill cluster, however, it is making only 1
minor segment. I have tried various queries like union of 2 queries
, aggragation etc, and executing it on millions records however it is
still making 1 fragment only. Is there any configuration change that I can
do for making multiple segments so that these could be executed on each
drill bit individually. How can I confirm whether the query is being
executed on 1 drillbit instance or multiple instances.

- We are trying to compare Impala vs Drill , but for the moment Impala is
more fast Than Drill

- Environment :

Drill On Yarn : whith 6 drillbits;

Regards Hugues Teddy

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message