Heng: Can you pastebin the complete stack trace for the region server ? Snippet from region server log may also provide more clue. Thanks On Wed, May 25, 2016 at 9:48 PM, Heng Chen wrote: > On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a) > always in failed close state, So balancer could not run. > > > i check the region on RS, and found logs about this region > > 2016-05-26 12:42:10,490 INFO [MemStoreFlusher.1] > regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up > 'too many store files'; waited long enough... proceeding with flush of > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > 2016-05-26 12:42:20,043 INFO > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1] > regionserver.HRegionServer: > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore > requesting flush for region > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > after a delay of 20753 > 2016-05-26 12:42:30,043 INFO > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1] > regionserver.HRegionServer: > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore > requesting flush for region > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > after a delay of 7057 > > > relate jstack information like below: > > Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2): > State: WAITING > Blocked count: 1 > Waited count: 2 > Waiting on > org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c > Stack: > java.lang.Object.wait(Native Method) > java.lang.Object.wait(Object.java:502) > > org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512) > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371) > org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336) > > org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138) > > org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > java.lang.Thread.run(Thread.java:745) > > > Our HBase cluster version is 1.1.1, i try to compact this region, > compact stuck in progress 89.58% > > > frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a. > 85860221 85860221 > 89.58% >