hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saad Mufti <saad.mu...@gmail.com>
Subject Re: Bucket Cache Failure In HBase 1.3.1
Date Thu, 01 Mar 2018 02:35:07 GMT
Thanks, see my other reply. We have a patch from the vendor but until it
gets promoted to open source we still don't know the real underlying cause,
but you're right the cache got disabled due to too many I/O errors in a
short timespan.

Cheers.

----
Saad


On Mon, Feb 26, 2018 at 12:24 AM, ramkrishna vasudevan <
ramkrishna.s.vasudevan@gmail.com> wrote:

> From the logs, it seems there were some issue with the file that was used
> by the bucket cache. Probably the volume where the file was mounted had
> some issues.
> If you can confirm that , then this issue should be pretty straightforward.
> If not let us know, we can help.
>
> Regards
> Ram
>
> On Sun, Feb 25, 2018 at 9:40 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > Here is related code for disabling bucket cache:
> >
> >     if (this.ioErrorStartTime > 0) {
> >
> >       if (cacheEnabled && (now - ioErrorStartTime) > this.
> > ioErrorsTolerationDuration) {
> >
> >         LOG.error("IO errors duration time has exceeded " +
> > ioErrorsTolerationDuration +
> >
> >           "ms, disabling cache, please check your IOEngine");
> >
> >         disableCache();
> >
> > Can you search in the region server log to see if the above occurred ?
> >
> > Was this server the only one with disabled cache ?
> >
> > Cheers
> >
> > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti <saad.mufti@oath.com.invalid
> >
> > wrote:
> >
> > > HI,
> > >
> > > I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
> > > configured to use two attached EBS disks of 50 GB each and I
> provisioned
> > > the bucket cache to be a bit less than the total, at a total of 98 GB
> per
> > > instance to be on the safe side. My tables have column families set to
> > > prefetch on open.
> > >
> > > On some instances during cluster startup, the bucket cache starts
> > throwing
> > > errors, and eventually the bucket cache gets completely disabled on
> this
> > > instance. The instance still stays up as a valid region server and the
> > only
> > > clue in the region server UI is that the bucket cache tab reports a
> count
> > > of 0, and size of 0 bytes.
> > >
> > > I have already opened a ticket with AWS to see if there are problems
> with
> > > the EBS volumes, but wanted to tap the open source community's
> hive-mind
> > to
> > > see what kind of problem would cause the bucket cache to get disabled.
> If
> > > the application depends on the bucket cache for performance, wouldn't
> it
> > be
> > > better to just remove that region server from the pool if its bucket
> > cache
> > > cannot be recovered/enabled?
> > >
> > > The error look like the following. Would appreciate any insight, thank:
> > >
> > > 2018-02-25 01:12:47,780 ERROR [hfile-prefetch-1519513834057]
> > > bucket.BucketCache: Failed reading block
> > > 332b0634287f4c42851bc1a55ffe4042_1348128 from bucket cache
> > > java.nio.channels.ClosedByInterruptException
> > >         at
> > > java.nio.channels.spi.AbstractInterruptibleChannel.end(
> > > AbstractInterruptibleChannel.java:202)
> > >         at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.
> > > java:746)
> > >         at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:727)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileReadAccessor.access(FileIOEngine.java:219)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > read(FileIOEngine.java:105)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache.
> > > getBlock(BucketCache.java:492)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.CombinedBlockCache.
> > > getBlock(CombinedBlockCache.java:84)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.
> > > getCachedBlock(HFileReaderV2.java:279)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(
> > > HFileReaderV2.java:420)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.HFileReaderV2$1.run(
> > > HFileReaderV2.java:209)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > > ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> > >         at
> > > java.util.concurrent.ScheduledThreadPoolExecutor$
> > ScheduledFutureTask.run(
> > > ScheduledThreadPoolExecutor.java:293)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > ThreadPoolExecutor.java:1149)
> > >         at
> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and
> > >
> > > 2018-02-25 01:12:52,432 ERROR [regionserver/
> > > ip-xx-xx-xx-xx.xx-xx-xx.us-east-1.ec2.xx.net/xx.xx.xx.xx:
> > > 16020-BucketCacheWriter-7]
> > > bucket.BucketCache: Failed writing to bucket cache
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:758)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine$
> > > FileWriteAccessor.access(FileIOEngine.java:227)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > accessFile(FileIOEngine.java:170)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > write(FileIOEngine.java:116)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > RAMQueueEntry.writeToCache(BucketCache.java:1357)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:883)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > and later
> > > 2018-02-25 01:13:47,783 INFO  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4]
> > > bucket.BucketCach
> > > e: regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-4
> > > exiting, cacheEnabled=false
> > > 2018-02-25 01:13:47,864 WARN  [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.FileIOEngi
> > > ne: Failed syncing data to /mnt1/hbase/bucketcache
> > > 2018-02-25 01:13:47,864 ERROR [regionserver/
> > > ip-10-194-246-70.aolp-ds-dev.us-east-1.ec2.aolcloud.net/10.
> > > 194.246.70:16020-BucketCacheWriter-6]
> > > bucket.BucketCach
> > > e: Failed syncing IO engine
> > > java.nio.channels.ClosedChannelException
> > >         at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.
> > java:110)
> > >         at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:379)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.FileIOEngine.
> > > sync(FileIOEngine.java:128)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > WriterThread.doDrain(
> > > BucketCache.java:911)
> > >         at
> > > org.apache.hadoop.hbase.io.hfile.bucket.BucketCache$
> > > WriterThread.run(BucketCache.java:838)
> > >         at java.lang.Thread.run(Thread.java:748)
> > >
> > > ----
> > > Saad
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message