There is debug log in HRegion#replayWALFlushStartMarker :
LOG.debug(getRegionInfo().getEncodedName() + " : "
+ " Prepared flush with seqId:" +
flush.getFlushSequenceNumber());
...
LOG.debug(getRegionInfo().getEncodedName() + " : "
+ " Prepared empty flush with seqId:" +
flush.getFlushSequenceNumber());
I searched for them in the log you attached to HBASE-15900 but didn't find
any occurrence.
FYI
On Mon, May 30, 2016 at 2:59 AM, Heng Chen <heng.chen.1986@gmail.com> wrote:
> I find something useful.
>
> When we do region.close, if there is one compaction or flush in progress,
> close will wait for compaction or flush be finished.
>
> {code: title=HRegion.java}
>
> @Override
> public void waitForFlushesAndCompactions() {
> synchronized (writestate) {
> if (this.writestate.readOnly) {
> // we should not wait for replayed flushed if we are read only
> (for example in case the
> // region is a secondary replica).
> return;
> }
> boolean interrupted = false;
> try {
> while (writestate.compacting > 0 || writestate.flushing) {
> LOG.debug("waiting for " + writestate.compacting + " compactions"
> + (writestate.flushing ? " & cache flush" : "") + " to
> complete for region " + this);
> try {
> writestate.wait();
> } catch (InterruptedException iex) {
> // essentially ignore and propagate the interrupt back up
> LOG.warn("Interrupted while waiting");
> interrupted = true;
> }
> }
> } finally {
> if (interrupted) {
> Thread.currentThread().interrupt();
> }
> }
> }
> }
>
> {code}
>
> And writestate.flushing will be set to be true in two place:
>
> HRegion.flushCache and HRegion.replayWALFlushStartMarker
>
> {code: title=HRegion.flushCache}
>
> synchronized (writestate) {
> if (!writestate.flushing && writestate.writesEnabled) {
> this.writestate.flushing = true;
> } else {
> *....*
> }
> }
>
> {code}
>
> {code: title=HRegion.replayWALFlushStartMarker}
>
> synchronized (writestate) {
> try {
> *....*
> if (!writestate.flushing) {
>
> this.writestate.flushing = true;
> *...*
>
> * }*
>
> {code}
>
>
> Notice that, in HRegion.replayWALFlushStartMarker, we did not check
> writestate.writesEnabled before set writestate.flushing to be true.
>
> So if region.close wake up in writestate.wait, but the lock acquried by
> HRegion.replayWALFlushStartMarker, the flushing will be set to be true
> again, and region.close will stuck in writestate.wait forever.
>
>
> Will it happen in real logical?
>
>
> 2016-05-27 10:44 GMT+08:00 Heng Chen <heng.chen.1986@gmail.com>:
>
> > Thanks guys, yesterday i restart relate RS and failed close region
> reopen
> > successfuly. But today, there is another region fall in this state.
> >
> > I paste relate RS' jstack information. This time the failed close region
> > is 9368190b3ba46238534b6307702aabae
> >
> > 2016-05-26 21:50 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> >
> >> Heng:
> >> Can you pastebin the complete stack trace for the region server ?
> >>
> >> Snippet from region server log may also provide more clue.
> >>
> >> Thanks
> >>
> >> On Wed, May 25, 2016 at 9:48 PM, Heng Chen <heng.chen.1986@gmail.com>
> >> wrote:
> >>
> >> > On master web UI, i could see region
> (c371fb20c372b8edbf54735409ab5c4a)
> >> > always in failed close state, So balancer could not run.
> >> >
> >> >
> >> > i check the region on RS, and found logs about this region
> >> >
> >> > 2016-05-26 12:42:10,490 INFO [MemStoreFlusher.1]
> >> > regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean
> up
> >> > 'too many store files'; waited long enough... proceeding with flush of
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> >> > 2016-05-26 12:42:20,043 INFO
> >> > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> >> > regionserver.HRegionServer:
> >> > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> >> > requesting flush for region
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> >> > after a delay of 20753
> >> > 2016-05-26 12:42:30,043 INFO
> >> > [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> >> > regionserver.HRegionServer:
> >> > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> >> > requesting flush for region
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> >> > after a delay of 7057
> >> >
> >> >
> >> > relate jstack information like below:
> >> >
> >> > Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2):
> >> > State: WAITING
> >> > Blocked count: 1
> >> > Waited count: 2
> >> > Waiting on
> >> > org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c
> >> > Stack:
> >> > java.lang.Object.wait(Native Method)
> >> > java.lang.Object.wait(Object.java:502)
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
> >> >
> >> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
> >> >
> >> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
> >> >
> >> >
> >>
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
> >> >
> >> >
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >> >
> >> >
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >> > java.lang.Thread.run(Thread.java:745)
> >> >
> >> >
> >> > Our HBase cluster version is 1.1.1, i try to compact this region,
> >> > compact stuck in progress 89.58%
> >> >
> >> >
> >> >
> >>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> >> > 85860221 85860221
> >> > 89.58%
> >> >
> >>
> >
> >
>
|