hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Bucket Cache Failure In HBase 1.3.1
Date Thu, 01 Mar 2018 02:33:57 GMT
Thanks for the feedback, so you guys are right the bucket cache is getting
disabled due to too many I/O errors from the underlying files making up the
bucket cache. Still do not know the exact underlying cause, but we are
working with our vendor to test a patch they provided that seems to have
resolved the issue for now. They say if it works out well they will
eventually try to promote the patch to the open source versions.

Cheers.

----
Saad


On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Here is related code for disabling bucket cache:
>
>     if (this.ioErrorStartTime > 0) {
>
>       if (cacheEnabled && (now - ioErrorStartTime) > this.
> ioErrorsTolerationDuration) {
>
>         LOG.error("IO errors duration time has exceeded " +
> ioErrorsTolerationDuration +
>
>           "ms, disabling cache, please check your IOEngine");
>
>         disableCache();
>
> Can you search in the region server log to see if the above occurred ?
>
> Was this server the only one with disabled cache ?
>
> Cheers
>
> On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <saad.mufti@oath.com.invalid>
> wrote:
>
> > HI,
> >
> > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > configured to use two attached EBS disks of 50 GB each and I provisioned
> > the bucket cache to be a bit less than the total, at a total of 98 GB per
> > instance to be on the safe side. My tables have column families set to
> > prefetch on open.
> >
> > On some instances during cluster startup, the bucket cache starts
> throwing
> > errors, and eventually the bucket cache gets completely disabled on this
> > instance. The instance still stays up as a valid region server and the
> only
> > clue in the region server UI is that the bucket cache tab reports a count
> > of 0, and size of 0 bytes.
> >
> > I have already opened a ticket with AWS to see if there are problems with
> > the EBS volumes, but wanted to tap the open source community's hive-mind
> to
> > see what kind of problem would cause the bucket cache to get disabled. If
> > the application depends on the bucket cache for performance, wouldn't it
> be
> > better to just remove that region server from the pool if its bucket
> cache
> > cannot be recovered/enabled?
> >
> > The error look like the following. Would appreciate any insight, thank:
> >
> > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > bucket.BucketCache: Failed reading block
> > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > java.nio.channels.ClosedByInterruptException
> >         at
> > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > AbstractInterruptibleChannel.java:202)
> >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > java:746)
> >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileReadAccessor.access(FileIOEngine.java:219)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > read(FileIOEngine.java:105)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > getBlock(BucketCache.java:492)
> >         at
> > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > getBlock(CombinedBlockCache.java:84)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > getCachedBlock(HFileReaderV2.java:279)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > HFileReaderV2.java:420)
> >         at
> > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > HFileReaderV2.java:209)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> >         at
> > java.util.concurrent.ScheduledThreadPoolExecutor$
> ScheduledFutureTask.run(
> > ScheduledThreadPoolExecutor.java:293)
> >         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1149)
> >         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and
> >
> > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > 16020-BucketCacheWriter-7]
> > bucket.BucketCache: Failed writing to bucket cache
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > FileWriteAccessor.access(FileIOEngine.java:227)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > accessFile(FileIOEngine.java:170)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > write(FileIOEngine.java:116)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:883)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > and later
> > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4]
> > bucket.BucketCach
> > e: regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-4
> > exiting, cacheEnabled=false
> > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.FileIOEngi
> > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > 194.246.70:16020-BucketCacheWriter-6]
> > bucket.BucketCach
> > e: Failed syncing IO engine
> > java.nio.channels.ClosedChannelException
> >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> java:110)
> >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > sync(FileIOEngine.java:128)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> WriterThread.doDrain(
> > BucketCache.java:911)
> >         at
> > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.run(BucketCache.java:838)
> >         at java.lang.Thread.run(Thread.java:748)
> >
> > ----
> > Saad
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message