hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Luke Forehand <luke.foreh...@networkedinsights.com>
Subject Re: Hanging regionservers
Date Fri, 16 Jul 2010 17:55:33 GMT
I was about to migrate to CHD3b2 but thought I would wait for a few replies before doing so.
 I'll likely have the migration done over the weekend.

Java:
java version "1.6.0_16"
Java(TM) SE Runtime Environment (build 1.6.0_16-b01)
Java HotSpot(TM) 64-Bit Server VM (build 14.2-b01, mixed mode)

OS:
CentOS release 5.5 (Final)
2.6.18-194.3.1.el5 #1 SMP Thu May 13 13:08:30 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Thanks
Luke

On 7/16/10 12:45 PM, "Stack" <stack@duboce.net> wrote:

Yeah, go to cdh3b2 if you can.  If you can repro there, there's a few
fellas (other than us hbasers) who'd be real interested in your
problem.
St.Ack

On Fri, Jul 16, 2010 at 10:40 AM, Stack <stack@duboce.net> wrote:
> Each time you threaddump, its stuck in same way?
>
> I've not seen this dfsclient hangup before, not that I remember.  Let
> me ask some hdfs-heads.  Will be back to you.
>
> Any chance of your upping to CHD3b2, for your hadoop at least?  HDFS
> has a few dfsclient/ipc fixes -- though looking at them none seem to
> explicitly address your issue.
>
> Whats that jvm that you are running?  Can you do a java -version?
> Whats your OS?
>
> Thanks,
> St.Ack
>
>
>
> On Fri, Jul 16, 2010 at 10:10 AM, Luke Forehand
> <luke.forehand@networkedinsights.com> wrote:
>> Line 58 and line 79 are the threads that I found suspicious.
>>
>> http://pastebin.com/W1E2nCZq
>>
>> The other stack traces from the other two region servers look identical to this one.
 BTW - I have made the config changes per Ryan Rawson's suggestion (thanks!) and I've processed
~7 GB of the 15 GB without hangup thus far so I'm crossing my fingers.
>>
>> -Luke
>>
>> On 7/16/10 11:48 AM, "Stack" <stack@duboce.net> wrote:
>>
>> Would you mind pastebinning the stacktrace?  It doesn't looks like
>> https://issues.apache.org/jira/browse/HDFS-88 (HBASE-667) going by the
>> below, an issue that HADOOP-5859 purportedly fixes -- I see you
>> commented on it -- but our Todd thinks otherwise (He has a 'real' fix
>> up in another issue that I currently can't put my finger on).
>> St.Ack
>>
>> On Fri, Jul 16, 2010 at 7:19 AM, Luke Forehand
>> <luke.forehand@networkedinsights.com> wrote:
>>>
>>> I grepped yesterday's logs on all servers for "Blocking updates" and there was
no trace.  I believe I had encountered the blocking updates problem earlier in the project
but throttled down the import speed which seemed to fix that.
>>>
>>> I just double checked and all three region servers were idle.  Something interesting
that I noticed however, was that each regionserver had a particular ResponseProcessor thread
running for a specific block, and that thread was stuck in a running state during the entirety
of the hang.  Also a DataStreamer thread for the block associated with the ResponseProcessor
was in a wait state.  This makes me think that each server was stuck operating on a specific
block.
>>>
>>> "ResponseProcessor for block blk_1926230463847049982_2694658" - Thread t@61160
>>>   java.lang.Thread.State: RUNNABLE
>>>    at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>>>    at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215)
>>>    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
>>>    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
>>>    - locked sun.nio.ch.Util$1@196fbfd0
>>>    - locked java.util.Collections$UnmodifiableSet@7799fdbb
>>>    - locked sun.nio.ch.EPollSelectorImpl@1ee13d55
>>>    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80)
>>>    at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:332)
>>>    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>>    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>    at java.io.DataInputStream.readFully(DataInputStream.java:178)
>>>    at java.io.DataInputStream.readLong(DataInputStream.java:399)
>>>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:2399)
>>>
>>>   Locked ownable synchronizers:
>>>    - None
>>>
>>> "DataStreamer for file /hbase/.logs/dn01.colo.networkedinsights.com,60020,1279222293084/hlog.dat.1279228611023
block blk_1926230463847049982_2694658" - Thread t@61158
>>>   java.lang.Thread.State: TIMED_WAITING on java.util.LinkedList@475b455c
>>>    at java.lang.Object.wait(Native Method)
>>>    at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2247)
>>>
>>>   Locked ownable synchronizers:
>>>    - None
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message