hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Joost Ouwerkerk <jo...@openplaces.org>
Subject Re: Deadlock when mapping a table?
Date Mon, 12 Apr 2010 19:18:46 GMT
Thread dump of TaskTracker:
  http://gist.github.com/363898

Thread dump of RegionServer:
  http://gist.github.com/363899

Not clear what's going on.  I'm going to have a look at HBASE-2180...

joost.

On Sat, Apr 10, 2010 at 10:41 PM, Stack <stack@duboce.net> wrote:

> On Sat, Apr 10, 2010 at 4:38 PM, Joost Ouwerkerk <joost@openplaces.org>
> wrote:
> > We're mapping a table with about 2 million rows in 100 regions on 40
> nodes.
> > In each map, we're doing a random read on the same table.  We're
> > encountering a situation that looks alot like deadlock.  When the job is
> > launched, some of the tasktrackers appear to get blocked in doing the
> first
> > random read.  The only trace we get is an eventual Unknown Scanner
> Exception
> > in the RegionServer log, at which point the task is actually reported as
> > successfully completed by MapReduce (1 row processed).  There is no error
> in
> > the task's log.  The job completes as SUCCESSFUL with an incomplete
> number
> > of rows.  In the worst case scenario, we've actually seen ALL the
> > tasktrackers encounter this problem; the job completes succesfully with
> 100
> > rows processed (1 per region).
>
>
> Any chance of a threaddump on the the problematic RS at the time?  Can
> you even figure the culprit?  There is a known deadlock that can
> happen writing (HBASE-2322) but this seems like something else.  If
> its a deadlock, often JVM can recognize it as so and it'll be detailed
> on the tail of the threaddump.  Todd has been messing too w/ jcarder
> (sp)?  That found HBASE-2322 but thats all it found I believe (I need
> to run it on next release candidate before it becomes a release
> candidate).  Maybe you're running into very slow reads because you
> don't have HBASE-2180?
>
> St.Ack
>
>
>
> >
> > When we remove the code that does the random read in the map, there are
> no
> > problems.
> >
> > Anyone?  This is driving me crazy because I can't reproduce it locally
> (it
> > only seems to be a problem in a distributed environment with many nodes)
> and
> > because there is no stacktrace besides the scanner exception (which is
> > clearly a symptom, not a cause).
> >
> > j
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message