hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Heng Chen <heng.chen.1...@gmail.com>
Subject Re: Memstore blocking
Date Fri, 03 Jun 2016 03:42:45 GMT
Something wrong in snappy Library?

Have you try to not use compression?

2016-06-03 11:13 GMT+08:00 吴国泉wgq <wgq.wu@qunar.com>:

> HI STACK:
>
>    1.   The log is very large,so I pick some of it. But it seems not
> provide valuable info.Here is the region named
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be  can’t flush.
>
>       When the flush Thread works well, The log is like this:
>       2016-05-24 12:38:27,071 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 16681
> 2016-05-24 12:38:37,071 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 8684
> 2016-05-24 12:38:43,753 INFO  [MemStoreFlusher.2] regionserver.HRegion:
> Started memstore flush for
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., current region
> memstore size 305.3 K
> 2016-05-24 12:38:43,753 WARN  [MemStoreFlusher.2] wal.FSHLog: Couldn't
> find oldest seqNum for the region we are about to flush:
> [dd8f92e3c161a8534b30ab17c28ae8be]
> 2016-05-24 12:38:43,816 INFO  [MemStoreFlusher.2]
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=54259, memsize=305.3
> K, hasBloomFilter=true, into tmp file
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
> 2016-05-24 12:38:43,822 DEBUG [MemStoreFlusher.2]
> regionserver.HRegionFileSystem: Committing store file
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
> as
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d
> 2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HStore:
> Added
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d,
> entries=108, sequenceid=54259, filesize=68.3 K
> 2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HRegion:
> Finished memstore flush of ~305.3 K/312664, currentsize=0/0 for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. in 84ms,
> sequenceid=54259, compaction requested=true
>
>       And when the  flush Thread does not  work well,The log just always
> shows :
> 2016-05-25 14:57:02,588 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 18068
> 2016-05-25 14:57:12,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 13165
> 2016-05-25 14:57:20,656 DEBUG [MemStoreFlusher.36] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:22,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 5526
> 2016-05-25 14:57:28,113 DEBUG [MemStoreFlusher.34] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:32,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 8178
> 2016-05-25 14:57:40,767 DEBUG [MemStoreFlusher.9] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:42,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 22068
> 2016-05-25 14:57:52,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 5492
> 2016-05-25 14:58:02,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 10472
> 2016-05-25 14:58:04,655 DEBUG [MemStoreFlusher.23] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:58:12,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 8435
>
>      I check the log before  the region can’t flush, but find nothing
> special from usual.
>      by the way, this table has a heavy load on  writing, so I turn off
> the WAL.
>
>
> 2. It is the  disk is filling faster than on other machines.  Not the
> instat show.
>
> 3.this is the jstack log of the memstoreflusher:
> "MemStoreFlusher.14" prio=10 tid=0x00007fe69487d800 nid=0x387d waiting on
> condition [0x00007fce7118c000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00007fd09e031470> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
> at java.lang.Thread.run(Thread.java:744)
>
> "MemStoreFlusher.13" prio=10 tid=0x00007fe69487b800 nid=0x387c runnable
> [0x00007fce7128c000]
>    java.lang.Thread.State: RUNNABLE
> at
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
> Method)
> at
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
> - locked <0x00007fcfe3ba1190> (a
> org.apache.hadoop.io.compress.snappy.SnappyCompressor)
> at
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> at
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> at
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
> at
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
> at
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
> at
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
> - locked <0x00007fd8d1778be8> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
> at
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> at java.lang.Thread.run(Thread.java:744)
>
> "MemStoreFlusher.12" prio=10 tid=0x00007fe694879800 nid=0x387b waiting on
> condition [0x00007fce7138e000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00007fd09e031470> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
> at
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
> at java.lang.Thread.run(Thread.java:744)
>
> 在 2016年6月3日,上午12:08,Stack <stack@duboce.net<mailto:stack@duboce.net>>
写道:
>
> On Wed, Jun 1, 2016 at 3:03 AM, 吴国泉wgq <wgq.wu@qunar.com<mailto:
> wgq.wu@qunar.com>> wrote:
>
> hi all:
>
> 1.Is region always on same machine or do you see this phenomenon on more
> than one machine?
>       Not always on the same machine, but always on the machine which
> hold the first region of a table(the only table that its first region
> cannot flush,when restart the regionserver,  the first region would move to
> another machine)
>
> 2.The RS is ABORTED? Because it can't flush? Is that what it says in the
> log? Can we see the log message around the ABORT?
>       sorry,I did not express clear here. It is the MemStore of the first
> region can’t flush, not the RS.
>       The RS Log is like this:
>       INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 9452
>       DEBUG [MemStoreFlusher.39] regionserver.HRegion
> memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
>
>       And the web UI shows:
>       Aborted flushing .(Not flushing since already flushing.) But  the
> Flusher thread never finish.
>
>
>
>
> Can you back up in the log for this region? It looks like the first flush
> attempt failed or never completed. Perhaps there is a clue in the log
> message from the first attempt as to why the failure. The above log seems
> to be from a later stage, when we have gotten into the unhealthy state
> where we can't flush the first region memstore.
>
>
>
> 3.100% disk only? The CPU does not go up too?  Can we see a thread dump?
> Do jstack -l  PID if you can
>       Only the disk usage(command : df -h) increases faster than other
> machine, not the IO usage. The usage of CPU is very low.
>
>
>
> Sorry. Your disk is filling faster than on other machines? Or the iostat
> shows the disk is doing more seeks than on other machines?
>
>
>
> 4.Any other special configurations going on on this install? Phoenix or
> use of Coprocessors?
>       NO, no phoenix. Only AccessController coprocessor.
>
> 5.If you thread dump a few times, it is always stuck here?
>       Yes, always stuck here. here is the jstack log.(in this log, it is
> the MemStoreFlusher.13 can ’t flush)
>
>
>       PS: As I see. I think this is because the first region cannot
> flush cause the problem. But I do not know why it can’t flush and why just
> the first region of the only table has this problem.
>
>
>
> Please include pointer to the stack trace. You can't have attachments on
> mail to this list.
>
> Thank you,
> St.Ack
>
>
>
>
>
> 在 2016年6月1日,上午3:10,Stack <stack@duboce.net<mailto:stack@duboce.net>>
写道:
>
> On Mon, May 30, 2016 at 7:03 PM, 聪聪 <175998806@qq.com<mailto:
> 175998806@qq.com>> wrote:
>
> HI ALL:       Recently,I met a strange problem,  the first Region’s
> Memstore of one table (the only one) often blocked when flushing.(Both
> Version:  hbase-0.98.6-cdh5.2.0  and 1.0.1.1, I updated 0.98 to
> 1.0.1.1,hope to solve the problem,But failed)
>
>
>
> Is region always on same machine or do you see this phenomenon on more than
> one machine?
>
>
>
>      On the web UI, I can see the status shows:  ABORTED(since XXsec
> ago), Not flushing since already flushing.
>
>
>
> The RS is ABORTED? Because it can't flush? Is that what it says in the log?
> Can we see the log message around the ABORT?
>
>
>
>      But it will never flush success, and the usage of the disk will
> increase very high.Now other regionservers just use 30% of the disk
> capacity, the problematic region server will increase to 100%,unless
> killing the process.
>
>
>
> 100% disk only? The CPU does not go up too?
>
> Can we see a thread dump? Do jstack -l  PID if you can.
>
>
>
>      What’s more, the region server process cannot be shutdown
> normally,every time I have to use the KILL -9 command.
>      I check the log,the reason why cannot flush is one of the
> MemstoreFlusher cannot exiting.
>      The log is like blow:
>      2016-05-29 19:54:11,982 INFO  [MemStoreFlusher.13]
> regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting
> 2016-05-29 19:54:13,016 INFO  [MemStoreFlusher.6]
> regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting
> 2016-05-29 19:54:13,260 INFO  [MemStoreFlusher.16]
> regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting
> 2016-05-29 19:54:16,032 INFO  [MemStoreFlusher.33]
> regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting
> 2016-05-29 19:54:16,341 INFO  [MemStoreFlusher.25]
> regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.31]
> regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.29]
> regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.23]
> regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.32]
> regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.38]
> regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.10]
> regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.7]
> regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.12]
> regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.21]
> regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting
> 2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.37]
> regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting
> 2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.24]
> regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting
> 2016-05-29 19:54:16,806 INFO  [MemStoreFlusher.39]
> regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting
> 2016-05-29 19:54:16,908 INFO  [MemStoreFlusher.17]
> regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.3]
> regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.27]
> regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.18]
> regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.4]
> regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.19]
> regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.22]
> regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.9]
> regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting
> 2016-05-29 19:54:17,137 INFO  [MemStoreFlusher.36]
> regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting
> 2016-05-29 19:54:17,263 INFO  [MemStoreFlusher.11]
> regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.14]
> regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.2]
> regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.26]
> regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.0]
> regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.30]
> regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.28]
> regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting
>
>
> here is the jstack log when I try to shutdown the region server process
> normally:
> we can see the MemstoreFlusher.34 meet some problems, it can not exiting
> normally
> (By the way, we can see “locked at snappy”, but it not always locked at
> snappy,sometime it will locked at other place,so I think snappy is ok, Here
> I just want to show the MemStoreFlusher meet some problem).
>
>
> Any other special configurations going on on this install? Phoenix or use
>
> of Coprocessors?
>
>
>
> "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable
> [0x00007f08941c3000]
>   java.lang.Thread.State: RUNNABLE
> at
>
>
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
> Method)
> at
>
>
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
> - locked <0x00007f0a5f85c430> (a
> org.apache.hadoop.io.compress.snappy.SnappyCompressor)
> at
>
>
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> at
>
>
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> at
>
>
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
> at
>
>
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
> at
>
>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
>
>
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at
>
>
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
> - locked <0x00007f09fdcccda0> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> If you thread dump a few times, it is always stuck here?
>
> Thanks,
> St.Ack
>
>
>
>
> "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on
> condition [0x00007f08942c5000]
>   java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00007f09b8e20f80> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
>
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
>   I do not know why the memstore cannot flush.
>   Is this a bug?
>
>
>
>  下面是被转发的邮件:
>
> 发件人: quan <329853099@qq.com<mailto:329853099@qq.com>>
>
> 主题: memstore flush blocked
>
> 日期: 2016年5月30日 GMT+8 上午10:47:43
>
> 收件人: user@hbase.apache.org<mailto:user@hbase.apache.org>
>
> 抄送: 蒲聪 <cong.pu@qunar.com<mailto:cong.pu@qunar.com>>
>
>
>  HI ALL:       Recently,I met a strange problem,  the first Region’s
> Memstore of one table (the only one) often blocked when flushing.(Both
> Version:  hbase-0.98.6-cdh5.2.0  and 1.0.1.1, I updated 0.98 to
> 1.0.1.1,hope to solve the problem,But failed)
>      On the web UI, I can see the status shows:  ABORTED(since XXsec
> ago), Not flushing since already flushing.
>      But it will never flush success, and the usage of the disk will
> increase very high.Now other regionservers just use 30% of the disk
> capacity, the problematic region server will increase to 100%,unless
> killing the process.
>      What’s more, the region server process cannot be shutdown
> normally,every time I have to use the KILL -9 command.
>      I check the log,the reason why cannot flush is one of the
> MemstoreFlusher cannot exiting.
>      The log is like blow:
>      2016-05-29 19:54:11,982 INFO  [MemStoreFlusher.13]
> regionserver.MemStoreFlusher: MemStoreFlusher.13 exiting
> 2016-05-29 19:54:13,016 INFO  [MemStoreFlusher.6]
> regionserver.MemStoreFlusher: MemStoreFlusher.6 exiting
> 2016-05-29 19:54:13,260 INFO  [MemStoreFlusher.16]
> regionserver.MemStoreFlusher: MemStoreFlusher.16 exiting
> 2016-05-29 19:54:16,032 INFO  [MemStoreFlusher.33]
> regionserver.MemStoreFlusher: MemStoreFlusher.33 exiting
> 2016-05-29 19:54:16,341 INFO  [MemStoreFlusher.25]
> regionserver.MemStoreFlusher: MemStoreFlusher.25 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.31]
> regionserver.MemStoreFlusher: MemStoreFlusher.31 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.29]
> regionserver.MemStoreFlusher: MemStoreFlusher.29 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.23]
> regionserver.MemStoreFlusher: MemStoreFlusher.23 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.32]
> regionserver.MemStoreFlusher: MemStoreFlusher.32 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: MemStoreFlusher.1 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.38]
> regionserver.MemStoreFlusher: MemStoreFlusher.38 exiting
> 2016-05-29 19:54:16,621 INFO  [MemStoreFlusher.10]
> regionserver.MemStoreFlusher: MemStoreFlusher.10 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.7]
> regionserver.MemStoreFlusher: MemStoreFlusher.7 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.12]
> regionserver.MemStoreFlusher: MemStoreFlusher.12 exiting
> 2016-05-29 19:54:16,620 INFO  [MemStoreFlusher.21]
> regionserver.MemStoreFlusher: MemStoreFlusher.21 exiting
> 2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.37]
> regionserver.MemStoreFlusher: MemStoreFlusher.37 exiting
> 2016-05-29 19:54:16,622 INFO  [MemStoreFlusher.24]
> regionserver.MemStoreFlusher: MemStoreFlusher.24 exiting
> 2016-05-29 19:54:16,806 INFO  [MemStoreFlusher.39]
> regionserver.MemStoreFlusher: MemStoreFlusher.39 exiting
> 2016-05-29 19:54:16,908 INFO  [MemStoreFlusher.17]
> regionserver.MemStoreFlusher: MemStoreFlusher.17 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.3]
> regionserver.MemStoreFlusher: MemStoreFlusher.3 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.27]
> regionserver.MemStoreFlusher: MemStoreFlusher.27 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.18]
> regionserver.MemStoreFlusher: MemStoreFlusher.18 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.4]
> regionserver.MemStoreFlusher: MemStoreFlusher.4 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.19]
> regionserver.MemStoreFlusher: MemStoreFlusher.19 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.22]
> regionserver.MemStoreFlusher: MemStoreFlusher.22 exiting
> 2016-05-29 19:54:16,952 INFO  [MemStoreFlusher.9]
> regionserver.MemStoreFlusher: MemStoreFlusher.9 exiting
> 2016-05-29 19:54:17,137 INFO  [MemStoreFlusher.36]
> regionserver.MemStoreFlusher: MemStoreFlusher.36 exiting
> 2016-05-29 19:54:17,263 INFO  [MemStoreFlusher.11]
> regionserver.MemStoreFlusher: MemStoreFlusher.11 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.14]
> regionserver.MemStoreFlusher: MemStoreFlusher.14 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.2]
> regionserver.MemStoreFlusher: MemStoreFlusher.2 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.26]
> regionserver.MemStoreFlusher: MemStoreFlusher.26 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.0]
> regionserver.MemStoreFlusher: MemStoreFlusher.0 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.30]
> regionserver.MemStoreFlusher: MemStoreFlusher.30 exiting
> 2016-05-29 19:54:17,499 INFO  [MemStoreFlusher.28]
> regionserver.MemStoreFlusher: MemStoreFlusher.28 exiting
>
>
> here is the jstack log when I try to shutdown the region server process
> normally:
> we can see the MemstoreFlusher.34 meet some problems, it can not exiting
> normally
> (By the way, we can see “locked at snappy”, but it not always locked at
> snappy,sometime it will locked at other place,so I think snappy is ok, Here
> I just want to show the MemStoreFlusher meet some problem).
>
>
> "MemStoreFlusher.34" prio=10 tid=0x00007f20b8939000 nid=0x511b runnable
> [0x00007f08941c3000]
>   java.lang.Thread.State: RUNNABLE
> at
>
>
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compressBytesDirect(Native
> Method)
> at
>
>
> org.apache.hadoop.io.compress.snappy.SnappyCompressor.compress(SnappyCompressor.java:232)
> - locked <0x00007f0a5f85c430> (a
> org.apache.hadoop.io.compress.snappy.SnappyCompressor)
> at
>
>
> org.apache.hadoop.io.compress.BlockCompressorStream.compress(BlockCompressorStream.java:149)
> at
>
>
> org.apache.hadoop.io.compress.BlockCompressorStream.finish(BlockCompressorStream.java:142)
> at
>
>
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncoding(HFileBlockDefaultEncodingContext.java:238)
> at
>
>
> org.apache.hadoop.hbase.io.encoding.HFileBlockDefaultEncodingContext.compressAfterEncodingWithBlockType(HFileBlockDefaultEncodingContext.java:149)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlock(HFileBlock.java:783)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.ensureBlockReady(HFileBlock.java:761)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.finishBlockAndWriteHeaderAndData(HFileBlock.java:878)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.writeHeaderAndData(HFileBlock.java:864)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateBlock(HFileBlockIndex.java:948)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIntermediateLevel(HFileBlockIndex.java:931)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexWriter.writeIndexBlocks(HFileBlockIndex.java:840)
> at
>
>
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:367)
> at
>
>
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:974)
> at
>
>
> org.apache.hadoop.hbase.regionserver.StoreFlusher.finalizeWriter(StoreFlusher.java:69)
> at
>
>
> org.apache.hadoop.hbase.regionserver.DefaultStoreFlusher.flushSnapshot(DefaultStoreFlusher.java:70)
> - locked <0x00007f09fdcccda0> (a java.lang.Object)
> at org.apache.hadoop.hbase.regionserver.HStore.flushCache(HStore.java:829)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HStore$StoreFlusherImpl.flushCache(HStore.java:1989)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1779)
> at
>
>
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1662)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1575)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:461)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:435)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$800(MemStoreFlusher.java:66)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:253)
> at java.lang.Thread.run(Thread.java:744)
>
>
> "MemStoreFlusher.33" prio=10 tid=0x00007f20b8937000 nid=0x511a waiting on
> condition [0x00007f08942c5000]
>   java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00007f09b8e20f80> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
>
>
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:264)
> at java.util.concurrent.DelayQueue.poll(DelayQueue.java:68)
> at
>
>
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:233)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
>   I do not know why the memstore cannot flush.
>   Is this a bug?
>
>
>
> 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向
secteam发送邮件
>
>
>
>
>
> 安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向
secteam发送邮件
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message