drill-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacques Nadeau <jacq...@apache.org>
Subject Re: Drill favouring a particular Drillbit
Date Wed, 25 Mar 2015 19:16:01 GMT
On Steven's point, the node that the client connects to is not currently
randomized.  Given your description of behavior, I'm not sure that you're
hitting 2512 or just general undesirable distribution.

On Wed, Mar 25, 2015 at 10:18 AM, Steven Phillips <sphillips@maprtech.com>
wrote:

> This is a known issue:
>
> https://issues.apache.org/jira/browse/DRILL-2512
>
> On Wed, Mar 25, 2015 at 8:13 AM, Andries Engelbrecht <
> aengelbrecht@maprtech.com> wrote:
>
> > What version of Drill are you running?
> >
> > Any hints when looking at the query profiles? Is the node that is being
> > hammered the foreman for the queries and most of the major fragments are
> > tied to the foreman?
> >
> > —Andries
> >
> >
> > On Mar 25, 2015, at 12:00 AM, Adam Gilmore <dragoncurve@gmail.com>
> wrote:
> >
> > > Hi guys,
> > >
> > > I'm trying to understand how this could be possible.  I have a Hadoop
> > > cluster of a name node and two data nodes setup.  All have identical
> > specs
> > > in terms of CPU/RAM etc.
> > >
> > > The two data nodes have a replicated HDFS setup where I'm storing some
> > > Parquet files.
> > >
> > > A Drill cluster (with Zookeeper) is running with Drillbits on all three
> > > servers.
> > >
> > > When I submit a query to *any* of the Drillbits, no matter who the
> > foreman
> > > is, one particular data node gets picked to do the vast majority of the
> > > work.
> > >
> > > We've even added three more task nodes to the cluster and everything
> > still
> > > puts a huge load on one particular server.
> > >
> > > There is nothing unique about this data node.  HDFS is fully replicated
> > (no
> > > unreplicated blocks) to the other data node.
> > >
> > > I know that Drill tries to get data locality, so I'm wondering if this
> is
> > > the cause, but this essentially swamping this data node with 100% CPU
> > usage
> > > while leaving the others barely doing any work.
> > >
> > > As soon as we shut down the Drillbit on this data node, query
> performance
> > > increases significantly.
> > >
> > > Any thoughts on how I can troubleshoot why Drill is picking that
> > particular
> > > node?
> >
> >
>
>
> --
>  Steven Phillips
>  Software Engineer
>
>  mapr.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message