drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Rogers <par0...@yahoo.com.INVALID>
Subject Re: query performance with unequal drillbits
Date Tue, 21 Aug 2018 21:14:42 GMT
Hi Scott,

Drillbit symmetry is built deep into Drill's distribution model: the planner assumes Drillbits
are equal. Changing this assumption is possible (you cited MapReduce as a system that handles
this case), but would require complex code changes:

* Distribute scan blocks based on locality, or include machine capability when attempting
to balance reads (weaker machines get fewer reads, say)?
* When determining the number of minor fragments (execution tasks), base this on the total
available slots? (With each machine having a number of slots determined by its configuration,
say.) This is easier for simple operators (filter, project), but gets trickier for things
like sorts and joins.
* Prefer more powerful machines for some operators such as sort? (Sort on machines with the
most memory, or. a combination of memory and CPU)?
* Exclude weak nodes from being Foreman? (Or, dedicate such nodes to ONLY being Foreman?)

As you can see, the scheduling algorithm for an asymmetric cluster would be very complex and
very hard to get right. I suspect that is why Drill went with the much simpler assumption:
symmetric nodes.

In fact, to support asymmetry well, Drill would likely need a different paralyzer design,
one that sees assigning minor fragments to nodes as a simple slice & dice activity to
instead looking at more like YARN (or Kubernetes) does: as a process of assigning tasks to
slots using some kind of best-fit or bin-packing algorithm. Obviously not a trivial change!

For now, the best advice would be to configure all Drillbits to use the same amount of memory
and CPU. Use YARN to assign additional non-Drill tasks to larger nodes, while leaving Drill
as the only task on weaker nodes.

- Paul


    On Tuesday, August 21, 2018, 1:48:19 PM PDT, scott <tcots8888@gmail.com> wrote:
 Hi community,
I am trying to find a way to tune Drill so that weaker drillbits get less
data to work on so that the weak link doesn't drag my performance down. I
have drillbits running on a variety of hardware and sometimes these shared
resources get really slow. It seems that the query plan always evenly
divides the data fragments so that each drillbit gets the same data to chew
on. How do I make it give weaker drillbits less data?

Alternatively, is there a way to limit and queue fragments of the query and
leave them unassigned, then assign to drillbits as their resources become
free, similar to MapReduce?

Thanks for you time,
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message