drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steven Phillips <sphill...@maprtech.com>
Subject Re: Drill favouring a particular Drillbit
Date Thu, 26 Mar 2015 02:12:35 GMT
Actually, I believe a query submitted through REST interface will
instantiate a DrillClient, which uses the same ZKClusterCoordinator that
sqlline uses, and thus the foreman for the query is not necessarily on the
same drillbit as it was submitted to. But I'm still not sure it's related
to DRILL-2512.

I'll wait for your additional info before speculating further.

On Wed, Mar 25, 2015 at 6:54 PM, Adam Gilmore <dragoncurve@gmail.com> wrote:

> We actually setup a separate load balancer for port 8047 (we're submitting
> these queries via the REST API at the moment) so Zookeeper etc. is out of
> the equation, thus I doubt we're hitting DRILL-2512.
>
> When shutitng down the "troublesome" drillbit, it starts parallelizing much
> nicer again.  We even added 10+ nodes to the cluster and as long as that
> particular drillbit is shut down, it distributes very nicely.  The minute
> we start the drillbit on that node again, it starts swamping it with work.
>
> I'll shoot through the JSON profiles and some more information on the
> dataset etc. later today (Australian time!).
>
> On Thu, Mar 26, 2015 at 5:31 AM, Steven Phillips <sphillips@maprtech.com>
> wrote:
>
> > I didn't notice at first that Adam said "no matter who the foreman is".
> >
> > Another suspicion I have is that our current logic for assigning work
> will
> > assign to the exact same nodes every time we query a particular table.
> > Changing affinity factor may change it, but it will still be the same
> every
> > time. That is my suspicion, but I am not sure why shutting down the
> > drillbit would improve performance. I would expect that shutting down the
> > drillbit would result in a different drillbit becoming the hotspot.
> >
> > On Wed, Mar 25, 2015 at 12:16 PM, Jacques Nadeau <jacques@apache.org>
> > wrote:
> >
> > > On Steven's point, the node that the client connects to is not
> currently
> > > randomized.  Given your description of behavior, I'm not sure that
> you're
> > > hitting 2512 or just general undesirable distribution.
> > >
> > > On Wed, Mar 25, 2015 at 10:18 AM, Steven Phillips <
> > sphillips@maprtech.com>
> > > wrote:
> > >
> > > > This is a known issue:
> > > >
> > > > https://issues.apache.org/jira/browse/DRILL-2512
> > > >
> > > > On Wed, Mar 25, 2015 at 8:13 AM, Andries Engelbrecht <
> > > > aengelbrecht@maprtech.com> wrote:
> > > >
> > > > > What version of Drill are you running?
> > > > >
> > > > > Any hints when looking at the query profiles? Is the node that is
> > being
> > > > > hammered the foreman for the queries and most of the major
> fragments
> > > are
> > > > > tied to the foreman?
> > > > >
> > > > > —Andries
> > > > >
> > > > >
> > > > > On Mar 25, 2015, at 12:00 AM, Adam Gilmore <dragoncurve@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Hi guys,
> > > > > >
> > > > > > I'm trying to understand how this could be possible.  I have
a
> > Hadoop
> > > > > > cluster of a name node and two data nodes setup.  All have
> > identical
> > > > > specs
> > > > > > in terms of CPU/RAM etc.
> > > > > >
> > > > > > The two data nodes have a replicated HDFS setup where I'm storing
> > > some
> > > > > > Parquet files.
> > > > > >
> > > > > > A Drill cluster (with Zookeeper) is running with Drillbits on
all
> > > three
> > > > > > servers.
> > > > > >
> > > > > > When I submit a query to *any* of the Drillbits, no matter who
> the
> > > > > foreman
> > > > > > is, one particular data node gets picked to do the vast majority
> of
> > > the
> > > > > > work.
> > > > > >
> > > > > > We've even added three more task nodes to the cluster and
> > everything
> > > > > still
> > > > > > puts a huge load on one particular server.
> > > > > >
> > > > > > There is nothing unique about this data node.  HDFS is fully
> > > replicated
> > > > > (no
> > > > > > unreplicated blocks) to the other data node.
> > > > > >
> > > > > > I know that Drill tries to get data locality, so I'm wondering
if
> > > this
> > > > is
> > > > > > the cause, but this essentially swamping this data node with
100%
> > CPU
> > > > > usage
> > > > > > while leaving the others barely doing any work.
> > > > > >
> > > > > > As soon as we shut down the Drillbit on this data node, query
> > > > performance
> > > > > > increases significantly.
> > > > > >
> > > > > > Any thoughts on how I can troubleshoot why Drill is picking
that
> > > > > particular
> > > > > > node?
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > >  Steven Phillips
> > > >  Software Engineer
> > > >
> > > >  mapr.com
> > > >
> > >
> >
> >
> >
> > --
> >  Steven Phillips
> >  Software Engineer
> >
> >  mapr.com
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message