hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Deadlocked Regionserver process
Date Fri, 15 Jul 2011 06:03:05 GMT
Add a comment to the issue Ram.  Use of heavy-weight Date seems odd for sure.
St.Ack

On Thu, Jul 14, 2011 at 9:33 PM, Ramkrishna S Vasudevan
<ramakrishnas@huawei.com> wrote:
> Sorry its not Data class
>
> But the problem is in the use of Date class.  JD had once replied to the
> mailing list with the heading Re: Possible dead lock
> :)
>
> Regards
> Ram
>
> -----Original Message-----
> From: Ramkrishna S Vasudevan [mailto:ramakrishnas@huawei.com]
> Sent: Friday, July 15, 2011 9:26 AM
> To: user@hbase.apache.org
> Subject: RE: Deadlocked Regionserver process
>
> Hi
>
> I think this as stack mentioned in HBASE-3830 could be due to profiler.
>
> But the problem is in the use of Data class.  JD had once replied to the
> mailing list with the heading Re: Possible dead lock
>
> JD's reply
> =============================================================
> I see what you are saying, and I understand the deadlock, but what escapes
> me is why ResourceBundle has to go touch all the classes every time to find
> the locale as I see 2 threads doing the same. Maybe my understanding of what
> it does is just poor, but I also see that you are using the yourkit profiler
> so it's one more variable in the equation.
>
> In any case, using a Date strikes me as odd. Using a long representing
> System.currentTimeMillis is usually what we do.
> =======================================================================
> So here as per HBASE-4101 though the profiler has not run then the problem
> is the Date object called from the toString of the PriorityCompactionQueue.
>
> Regards
> Ram
>
>
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
> Sent: Friday, July 15, 2011 3:56 AM
> To: user@hbase.apache.org
> Subject: Re: Deadlocked Regionserver process
>
> Thank you.
>
> I've added below to issue.  Will take a looksee.  If issue, will
> include fix in 0.90.4.
>
> St.Ack
>
> On Thu, Jul 14, 2011 at 3:07 PM, Matt Davies <matt.davies@tynt.com> wrote:
>> We aren't profiling right now.  Here's what is in the hbase-env.sh
>>
>> export TZ="US/Mountain"
>> export HBASE_OPTS="$HBASE_OPTS -XX:+UseConcMarkSweepGC
>> -XX:+CMSIncrementalMode -verbose:gc -XX:+PrintGCDetails
>> -XX:+PrintGCTimeStamps -Xloggc:/home/hadoop/gc-hbase.log "
>> export HBASE_MANAGES_ZK=false
>> export HBASE_PID_DIR=/home/hadoop
>> export HBASE_HEAPSIZE=10240
>>
>> Java is
>> java version "1.6.0_17"
>> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
>> Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)
>>
>> We were planning an upgrade to 1.6.0_25 before we ran into this issue.
>>
>>
>>
>> On Thu, Jul 14, 2011 at 3:59 PM, Stack <stack@duboce.net> wrote:
>>
>>> What Lohit says but also, what jvm are you running and what options
>>> are you feeding it?  The stack trace is a little crazy (especially the
>>> mix in of resource bundle loading).  We saw something similar over in
>>> HBASE-3830 when someone was running profiler.  Is that what is going
>>> on here?
>>>
>>> Thanks,
>>> St.Ack
>>>
>>> On Thu, Jul 14, 2011 at 11:36 AM, Matt Davies <matt.davies@tynt.com>
>>> wrote:
>>> > Hey everyone,
>>> >
>>> > We periodically see a situation where the regionserver process exists
> in
>>> the
>>> > process list, zookeeper thread sends the keepalive so the master won't
>>> > remove it from the active list, yet the regionserver will not serve
> data.
>>> >
>>> > Hadoop(cdh3u0), HBase 0.90.3 (Apache version), under load from an
>>> internal
>>> > testing tool.
>>> >
>>> >
>>> > I've taken a jstack of the process and found this:
>>> >
>>> > Found one Java-level deadlock:
>>> > =============================
>>> > "IPC Server handler 99 on 60020":
>>> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
> a
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>>> >  which is held by "IPC Server handler 64 on 60020"
>>> > "IPC Server handler 64 on 60020":
>>> >  waiting for ownable synchronizer 0x00002aaab8eea130, (a
>>> > java.util.concurrent.locks.ReentrantLock$NonfairSync),
>>> >  which is held by "regionserver60020.cacheFlusher"
>>> > "regionserver60020.cacheFlusher":
>>> >  waiting to lock monitor 0x0000000047f97000 (object 0x00002aaab8ef07e8,
> a
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher),
>>> >  which is held by "IPC Server handler 64 on 60020"
>>> >
>>> > Java stack information for the threads listed above:
>>> > ===================================================
>>> > "IPC Server handler 99 on 60020":
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
> emStoreFlusher.java:434)
>>> >        - waiting to lock <0x00002aaab8ef07e8> (a
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
> 2529)
>>> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>>> >        at
>>> >
>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> .java:25)
>>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>>> >        at
>>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>>> > "IPC Server handler 64 on 60020":
>>> >        at sun.misc.Unsafe.park(Native Method)
>>> >        - parking to wait for  <0x00002aaab8eea130> (a
>>> > java.util.concurrent.locks.ReentrantLock$NonfairSync)
>>> >        at
>>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>> >        at
>>> >
>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(
> AbstractQueuedSynchronizer.java:747)
>>> >        at
>>> >
>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(Abstract
> QueuedSynchronizer.java:778)
>>> >        at
>>> >
>>>
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueued
> Synchronizer.java:1114)
>>> >        at
>>> >
>>>
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java
> :186)
>>> >        at
>>> > java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.reclaimMemStoreMemory(M
> emStoreFlusher.java:435)
>>> >        - locked <0x00002aaab8ef07e8> (a
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:
> 2529)
>>> >        at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
>>> >        at
>>> >
>>>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl
> .java:25)
>>> >        at java.lang.reflect.Method.invoke(Method.java:597)
>>> >        at
>>> > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039)
>>> > "regionserver60020.cacheFlusher":
>>> >        at java.util.ResourceBundle.endLoading(ResourceBundle.java:1506)
>>> >        - waiting to lock <0x00002aaab8ef07e8> (a
>>> > org.apache.hadoop.hbase.regionserver.MemStoreFlusher)
>>> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1379)
>>> >        at java.util.ResourceBundle.findBundle(ResourceBundle.java:1292)
>>> >        at
>>> java.util.ResourceBundle.getBundleImpl(ResourceBundle.java:1234)
>>> >        at java.util.ResourceBundle.getBundle(ResourceBundle.java:832)
>>> >        at sun.util.resources.LocaleData$1.run(LocaleData.java:127)
>>> >        at java.security.AccessController.doPrivileged(Native Method)
>>> >        at sun.util.resources.LocaleData.getBundle(LocaleData.java:125)
>>> >        at
>>> > sun.util.resources.LocaleData.getTimeZoneNames(LocaleData.java:97)
>>> >        at
>>> > sun.util.TimeZoneNameUtility.getBundle(TimeZoneNameUtility.java:115)
>>> >        at
>>> >
>>>
> sun.util.TimeZoneNameUtility.retrieveDisplayNames(TimeZoneNameUtility.java:8
> 0)
>>> >        at java.util.TimeZone.getDisplayNames(TimeZone.java:399)
>>> >        at java.util.TimeZone.getDisplayName(TimeZone.java:350)
>>> >        at java.util.Date.toString(Date.java:1025)
>>> >        at java.lang.String.valueOf(String.java:2826)
>>> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue$CompactionReque
> st.toString(PriorityCompactionQueue.java:114)
>>> >        at java.lang.String.valueOf(String.java:2826)
>>> >        at java.lang.StringBuilder.append(StringBuilder.java:115)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.addToRegionsInQ
> ueue(PriorityCompactionQueue.java:145)
>>> >        - locked <0x00002aaab8f2dc58> (a java.util.HashMap)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.PriorityCompactionQueue.add(PriorityCom
> pactionQueue.java:188)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
> mpactSplitThread.java:140)
>>> >        - locked <0x00002aaab8894048> (a
>>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.requestCompaction(Co
> mpactSplitThread.java:118)
>>> >        - locked <0x00002aaab8894048> (a
>>> > org.apache.hadoop.hbase.regionserver.CompactSplitThread)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:393)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlu
> sher.java:366)
>>> >        at
>>> >
>>>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.run(MemStoreFlusher.jav
> a:240)
>>> >
>>> >
>>> > Any ideas on how I could prevent this or let the master know about it?
>>> I've
>>> > written an app that will check all regionservers periodically for such
> a
>>> > lockup, but I can't run it constantly.
>>> >
>>> > I can provide more of the jstack if that is helpful.
>>> >
>>> > -Matt
>>> >
>>>
>>
>
>

Mime
View raw message