drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <prog...@mapr.com>
Subject Re: Pushing down Joins, Aggregates and filters, and data distribution questions
Date Fri, 02 Jun 2017 05:10:40 GMT
Hi Muhammad,

> I have a couple of questions:
>   1. If I have multiple *SubScan*s to be executed, will each *SubScan* be
>   handled by a single *Scan* operator ? So whenever I have *n* *SubScan*s,
>   I'll have *n* Scan operators distributed among Drill's cluster ?

As Rahul explained, subscans are assigned to fragments. Let’s say that three were assigned
to the same fragment. In this case, a single scan operator handles all three. Your “Scan
Batch Creator” will create a separate “Record Reader” for each subscan and hand them
to the scan operator. The scan operator then opens, reads, an closes each in turn.

>   2. How can I control the amount of any type of physical operators per
>   Drill cluster or node ? For instance, what if I want to have less
>   *Filter* operators or more *Scan* operators, how can I do that ?
I’ve not seen anything that suggests that this is possible. Drill groups operators into
fragments, then parallelizes the fragments. To accomplish what you want, you’d need to figure
out how Drill slices the DAG into fragments and adjust the slicing to isolate the operators
as you desire. Network exchanges join your custom fragments.

Parallelization is generic for all fragments as Rahul explained; I’ve seen nothing that
suggests we have a way to identify different categories of fragments and apply different parallelization
rules to each.

Maybe there is some Calcite magic available?

- Paul

View raw message