hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nanheng Wu <nanhen...@gmail.com>
Subject Re: What's the region server doing?
Date Wed, 02 Mar 2011 01:43:07 GMT
And what's "next?" .... and what's next?

On Tue, Mar 1, 2011 at 5:41 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
> I just took the stack track of both master and the meta RS. the
> master's still waiting for that thread which called "next", but no IPC
> Server handler on the RS has that call. Is that possible? Or have I
> just stared at this thing for too long?
> On Tue, Mar 1, 2011 at 5:32 PM, Jean-Daniel Cryans <jdcryans@apache.org> wrote:
>> Yes, and on the other side (which is the region server that hosts
>> .META.) you should be able to see that call. Well, not that specific
>> one, but one of them :)
>> J-D
>> On Tue, Mar 1, 2011 at 5:30 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>> You said "next", I don't know if this related at all but from the
>>> master's thread dump, it says the disable is blocked by this thread
>>> below, and it calling next:
>>> Thread 27 (RegionManager.metaScanner):
>>>  State: WAITING
>>>  Blocked count: 69503
>>>  Waited count: 68805
>>>  Waiting on org.apache.hadoop.hbase.ipc.HBaseClient$Call@42fcac6
>>>  Stack:
>>>    java.lang.Object.wait(Native Method)
>>>    java.lang.Object.wait(Object.java:485)
>>>    org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:722)
>>>    org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>>>    $Proxy1.next(Unknown Source)
>>>    org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:179)
>>>    org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>>>    org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>>>    org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:153)
>>>    org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>>> On Tue, Mar 1, 2011 at 5:22 PM, Nanheng Wu <nanhengwu@gmail.com> wrote:
>>>> Thanks man I'll try that and post back when I find something. BTW, I
>>>> ran the script to set the memstore flush size on .META., now I am
>>>> seeing a lot less writing to HDFS from the .META RS and less
>>>> compaction, unfortunately it's still low. :(
>>>> On Tue, Mar 1, 2011 at 5:15 PM, Jean-Daniel Cryans <jdcryans@apache.org>
>>>>> In that specific jstack it's doing nothing at all, but keep in mind
>>>>> that it's only a snapshot of a precise moment in time. Try jstack'ing
>>>>> a few times and at some point you should see the threads named like
>>>>> "IPC Server handler xx on 60020" (where xx is a number) showing bigger
>>>>> stack traces with HRegionServer doing stuff like get, next, put, etc
>>>>> You should also try scanning '.META.' from the shell and if it's slow,
>>>>> do the jstack'ing at the same time.
>>>>> J-D
>>>>> On Tue, Mar 1, 2011 at 5:07 PM, Nanheng Wu <nanhengwu@gmail.com>
>>>>>> My cluster (10 nodes, hbase-0.20.6 + hadoop 0.20.2) is very very
>>>>>> for any operation like disable table or delete. Master's thread dump
>>>>>> says they are blocked by the metaScanner thread. When I looked at
>>>>>> log file on the .META RS there are no outputs at all! (INFO debug
>>>>>> level). J-D has been helping me on this, we pretty much figured out
>>>>>> that RegionManager.metaScanner is the culprit, because it's taking
>>>>>> around 25 minutes to scan 8K rows. What I don't get is what the region
>>>>>> server is actually doing during this time. There's no request at
>>>>>> on the cluster, no RS splits either because we just use a MR job
>>>>>> output HFiles and never write again.
>>>>>> J-D has been really really helpful, but I feel like I took too much
>>>>>> his time. Below is the thread dump of the .META RS during the time
>>>>>> when disables command are blocked on meta scanner, can someone help
>>>>>> figure out what the server is doing, is it running any thread at
>>>>>> Thank you!
>>>>>> http://pastebin.com/CZQAywq3

View raw message