hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Bucket Cache Failure In HBase 1.3.1
Date Thu, 01 Mar 2018 03:08:26 GMT
I think it is for HBASE itself. But I'll have to wait for more details as
they haven't shared the source code with us. I imagine they want to do a
bunch more testing and other process stuff.

----
Saad

On Wed, Feb 28, 2018 at 9:45 PM Ted Yu <yuzhihong@gmail.com> wrote:

> Did the vendor say whether the patch is for hbase or some other component ?
>
> Thanks
>
> On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti <saad.mufti@gmail.com> wrote:
>
> > Thanks for the feedback, so you guys are right the bucket cache is
> getting
> > disabled due to too many I/O errors from the underlying files making up
> the
> > bucket cache. Still do not know the exact underlying cause, but we are
> > working with our vendor to test a patch they provided that seems to have
> > resolved the issue for now. They say if it works out well they will
> > eventually try to promote the patch to the open source versions.
> >
> > Cheers.
> >
> > ----
> > Saad
> >
> >
> > On Sun, Feb 25, 2018 at 11:10 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> > > Here is related code for disabling bucket cache:
> > >
> > >     if (this.ioErrorStartTime > 0) {
> > >
> > >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > > ioErrorsTolerationDuration) {
> > >
> > >         LOG.error("IO errors duration time has exceeded " +
> > > ioErrorsTolerationDuration +
> > >
> > >           "ms, disabling cache, please check your IOEngine");
> > >
> > >         disableCache();
> > >
> > > Can you search in the region server log to see if the above occurred ?
> > >
> > > Was this server the only one with disabled cache ?
> > >
> > > Cheers
> > >
> > > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti
> <saad.mufti@oath.com.invalid
> > >
> > > wrote:
> > >
> > > > HI,
> > > >
> > > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > > configured to use two attached EBS disks of 50 GB each and I
> > provisioned
> > > > the bucket cache to be a bit less than the total, at a total of 98 GB
> > per
> > > > instance to be on the safe side. My tables have column families set
> to
> > > > prefetch on open.
> > > >
> > > > On some instances during cluster startup, the bucket cache starts
> > > throwing
> > > > errors, and eventually the bucket cache gets completely disabled on
> > this
> > > > instance. The instance still stays up as a valid region server and
> the
> > > only
> > > > clue in the region server UI is that the bucket cache tab reports a
> > count
> > > > of 0, and size of 0 bytes.
> > > >
> > > > I have already opened a ticket with AWS to see if there are problems
> > with
> > > > the EBS volumes, but wanted to tap the open source community's
> > hive-mind
> > > to
> > > > see what kind of problem would cause the bucket cache to get
> disabled.
> > If
> > > > the application depends on the bucket cache for performance, wouldn't
> > it
> > > be
> > > > better to just remove that region server from the pool if its bucket
> > > cache
> > > > cannot be recovered/enabled?
> > > >
> > > > The error look like the following. Would appreciate any insight,
> thank:
> > > >
> > > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > > bucket.BucketCache: Failed reading block
> > > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > > java.nio.channels.ClosedByInterruptException
> > > >         at
> > > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > > AbstractInterruptibleChannel.java:202)
> > > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > > java:746)
> > > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > > FileReadAccessor.access(FileIOEngine.java:219)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > accessFile(FileIOEngine.java:170)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > read(FileIOEngine.java:105)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > > getBlock(BucketCache.java:492)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > > getBlock(CombinedBlockCache.java:84)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > > getCachedBlock(HFileReaderV2.java:279)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > > HFileReaderV2.java:420)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > > HFileReaderV2.java:209)
> > > >         at
> > > > java.util.concurrent.Executors$RunnableAdapter.
> > call(Executors.java:511)
> > > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > >         at
> > > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > > >         at
> > > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.run(
> > > > ScheduledThreadPoolExecutor.java:293)
> > > >         at
> > > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > ThreadPoolExecutor.java:1149)
> > > >         at
> > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > ThreadPoolExecutor.java:624)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > and
> > > >
> > > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > > 16020-BucketCacheWriter-7]
> > > > bucket.BucketCache: Failed writing to bucket cache
> > > > java.nio.channels.ClosedChannelException
> > > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > > java:110)
> > > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > > FileWriteAccessor.access(FileIOEngine.java:227)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > accessFile(FileIOEngine.java:170)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > write(FileIOEngine.java:116)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.doDrain(
> > > > BucketCache.java:883)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > WriterThread.run(BucketCache.java:838)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > and later
> > > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-4]
> > > > bucket.BucketCach
> > > > e: regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-4
> > > > exiting, cacheEnabled=false
> > > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-6]
> > > > bucket.FileIOEngi
> > > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > > 194.246.70:16020-BucketCacheWriter-6]
> > > > bucket.BucketCach
> > > > e: Failed syncing IO engine
> > > > java.nio.channels.ClosedChannelException
> > > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > > java:110)
> > > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > > sync(FileIOEngine.java:128)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.doDrain(
> > > > BucketCache.java:911)
> > > >         at
> > > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > > WriterThread.run(BucketCache.java:838)
> > > >         at java.lang.Thread.run(Thread.java:748)
> > > >
> > > > ----
> > > > Saad
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message