drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Tucker <dtuc...@maprtech.com>
Subject Re: Query performance and clustering
Date Wed, 25 Mar 2015 15:38:08 GMT
I’ll second Adnries’ comment about measurable performance in AWS : you should not expect
consistency there (especially with instance types that are smaller than a physical server,
such as the c3.xlarge instances you’re using).

How does the memory utilization look during your queries ?   Memory pressures often manifest
as CPU loading, especially in the pathological case of excessive Java garbage collection.
  Drill does an excellent job of separating the data being queried from the traditional Java
heap … but there can still be some pressure there.   Check the drillbit logs and see if
GC’s are occuring more frequently as your query count goes up.

— David


On Mar 25, 2015, at 8:09 AM, Andries Engelbrecht <aengelbrecht@maprtech.com> wrote:

> What version of Drill are you running?
> 
> It sounds like you are CPU bound, and the query time increases 10x with a 30x increase
in concurrency (which looks pretty good at first glance)
> At a high level this seems to be pretty reasonable, hard to give more specifics without
seeing the query profiles. What is consuming the most time (and resource) in the query profiles?
Perhaps there are some gains to be had in optimizing the queries.
> 
> If the cluster is primarily used for Drill you may want to adjust the planner.width.max_per_node
system parameter to consume more of the cores on the nodes.
> See what the current setting in in sys.options, and adjust to no more than the number
of cores on the node. Experimenting with this may help a bit.
> You also may want to experiment with planner.width.max_per_query.
> I have not looked into the queue mechanisms in detail yet, but it doesn’t seem that
the cluster is having issues with how it is managing concurrency.
> 
> Keep in mind AWS can be inconsistent in terms of performance, so hard to measure exacts
on a cloud platform.
> 
> —Andries
> 
> On Mar 25, 2015, at 5:44 AM, Adam Gilmore <dragoncurve@gmail.com> wrote:
> 
>> Hi all,
>> 
>> I'm doing some testing on query performance, especially in a clustered
>> environment.
>> 
>> The test data is 5 Parquet files with 2.2 million records in each file
>> (total of ~11m).
>> 
>> The cluster is an Amazon EMR cluster with a total of 10 drillbits
>> (c3.xlarge instances).
>> 
>> A single SUM() with a GROUP BY results in a ~700ms query.
>> 
>> We setup about 30 agents running a query every second (total 30 queries per
>> second) and the performance drops to queries at about 6-7 seconds.
>> 
>> The bottleneck seems to be entirely CPU based - all drillbits' CPUs are
>> fairly swamped.
>> 
>> Looking at the plans, the Parquet scan still performs fairly well, but the
>> hash aggregate gets gradually slower and slower (obviously competing for
>> CPU time).
>> 
>> Is this the expected query times for such a setup?  Is there anything
>> further I can investigate to gain more performance?
> 


Mime
View raw message