hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ramkrishna vasudevan <ramkrishna.s.vasude...@gmail.com>
Subject Re: Cacheblocksonwrite not working during compaction?
Date Mon, 23 Sep 2019 07:19:11 GMT
Hi

I can see your case where your compaction ends up your reads hitting the S3
for fetching the new blocks. However pls note that in the version that you
are using when ever a compaction happens and any scans/reads happening at
that point of time will still try to use the existing Hfiles that were
available during the start of the scan and hence those blocks are not
invalidated.

How ever any new files that are created out of compaction and new scans
that start after the compaction will need to fetch those blocks to the
cache on the first read (because we don't cache the block on write after
compaction). But you need to be careful like once you enable cache on write
for compaction (after your patch) the LRU behaviour may start evicting
other blocks which may be needed for other scans that may be fired in that
region server. If your use case is that the scan queries are going to be
repeating touching the same set of files then enabling cache on write after
compaction may help - also considering the fact that you use the file mode
bucket cache which is very big in size and evictions may not be common.

>>I'll plan on opening a JIRA ticket for this and I'd also be happy to take
a stab at creating a patch.
Pls feel free to open a JIRA.

Regards
Ram


On Mon, Sep 23, 2019 at 8:42 AM Jacob LeBlanc <jacob.leblanc@microfocus.com>
wrote:

> My questions were primarily around how cacheblocksonwrite, prefetching,
> and compaction work together, which I think is not AWS specific. Although
> it may be that yes, the 1+ hour prefetching I am seeing is an AWS-specific
> phenomenon.
>
> I've looked at the 1.4.9 source a bit more now that I have a better
> understanding of everything. As you say cacheDataOnWrite is hardcoded to
> false for compactions so the hbase.rs.cacheblocksonwrite setting will have
> no effect in these cases.
>
> I also now understand that the cache key is partly based on filename, so
> disabling hbase.rs.evictblocksonclose isn't going to help for compactions
> either since the pre-compaction filenames will no longer be relevant.
>
> Prefetching also makes more sense once I looked at the code. I see now it
> comes into effect for HFileReaderV2, so happens on a per-file basis, not
> per-region. I was confused before why I was seeing prefetching happen when
> the region was not opened recently, but now it makes sense because it is
> occurring when the compacted file is opened, not the region.
>
> So unfortunately, it looks like I'm sunk in terms of caching data during
> compaction. Thanks for the aid in understanding this.
>
> However, I do think this is a valid use case and also seems like it should
> be fairly easy to implement with a new cache config setting. On the one
> hand there is this nice prefetching feature which is acknowledging the use
> case for when people want to cache entire tables, and this use case is more
> common when considering larger L2 caches. Then on the other hand there is
> this hardcoded setting that is assuming nobody would ever want to cache all
> of the blocks being written during a compaction which seems at odds with
> the use case prefetching is trying to address. Don't get me wrong: I
> understand that in many use cases caching while writing during compaction
> is not desirable in that you don't want to evict blocks that you care about
> during the compaction process. In other words it sort of throws a big
> monkey wrench into the concept of an LRU cache. I also realize that
> hbase.rs.cachedataonwrite is geared more towards flushes for use cases
> where people often read what was recently written and don't necessarily
> want to cache the entire table. But a new config option (call it hbase.rs.cacheblocksoncompaction?)
> to address this specific use case would be nice.
>
> I'll plan on opening a JIRA ticket for this and I'd also be happy to take
> a stab at creating a patch.
>
> --Jacob LeBlanc
>
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vladrodionov@gmail.com]
> Sent: Friday, September 20, 2019 10:29 PM
> To: user@hbase.apache.org
> Subject: Re: Cacheblocksonwrite not working during compaction?
>
> You are asking questions on Apache HBase user forum, which are more
> appropriate to ask on AWS forum, taking into account that you are using
> Amazon-specific distributive of HBase and Amazon - specific implementation
> of  a S3 file system.
>
> As for not working hbase.rs.cacheblocksonwrite, HBase ignores this flag
> and set it to false forcefully if file writer is opened by compaction
> thread (this is true for 2.x, but I am pretty sure that in 1.x it is the
> same).
>
> -Vlad
>
> On Fri, Sep 20, 2019 at 4:24 PM Jacob LeBlanc <
> jacob.leblanc@microfocus.com>
> wrote:
>
> > Thank you for the feedback!
> >
> > Our cache size *is* larger than our data size, at least for our
> > heavily accessed tables. Memory may be prohibitively expensive for
> > keeping large tables in an in-memory cache, but storage is cheap, so
> > hosting a 1 TB bucketcache on the local disk of each of our region
> > servers is feasible and that is what we are trying to accomplish.
> >
> > I'm not sure I understand the complexity of populating a cache that is
> > supposed to represent the data in files on disk while writing out one
> > of those files during the compaction process. In fact, that's what I
> > understood the hbase.rs.cacheblocksonwrite to do (based on nothing
> > more than the description of the setting in the online hbase book - I
> > don't see very good documentation online for this feature). If that
> > setting doesn't do that, then what does it do exactly? What about the
> > hbase.rs.evictblocksonclose setting? Could that be evicting all of the
> > blocks that are put in the cache at the end of compaction? What are
> > the implications if we set that to "false"?
> >
> > Prefetching is also OK for us to do on some tables because we are
> > using the on-disk cache (I understand this also means opening a region
> > after a split or move will take longer). But I don't understand why it
> > appeared that prefetching was being done when the region wasn't opened
> > recently. I don't expect prefetching to help us with compactions, but
> > seeing the thread getting blocked after a compaction just raised a red
> > flag that I'm not understanding what is going on.
> >
> > I understand that some latency during compaction is expected, but what
> > we are seeing is fairly extreme. The instances take thread dumps every
> > 15 minutes and we saw threads still in a BLOCKED state on the same
> > input stream object an hour later! This is after a 3.0 GB compaction
> > was already done. If prefetching was happening, then something seems
> > wrong if it takes an hour to populate 3.0 GB worth of data in a local
> disk cache from S3.
> >
> > I appreciate the help on this!
> >
> > --Jacob LeBlanc
> >
> > -----Original Message-----
> > From: Vladimir Rodionov [mailto:vladrodionov@gmail.com]
> > Sent: Friday, September 20, 2019 6:41 PM
> > To: user@hbase.apache.org
> > Subject: Re: Cacheblocksonwrite not working during compaction?
> >
> > >>- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does
> > >>it
> > only work for flushing and not for compaction? I can see from the logs
> > that the file is renamed >>after being written. Does that have
> > something to do with why cacheblocksonwrite isn't working?
> >
> > Generally, it is a very bad idea to enable caching on read/write
> > during compaction unless your cache size is larger than your data size
> > (which is not a common case).
> > Cache invalidation during compaction is almost inevitable thing, due
> > to a complexity of a potential optimizations in this case. Actually,
> > there are some works (papers) on the Internet particularly dedicated
> > to a smarter cache invalidation algorithms for a LSM - derived storage
> > engines, but engineers, as usual, much more conservative than academia
> > researches and are not eager to implement novel (not battle tested)
> algorithms.
> > Latency spikes during compaction are normal and inevitable things, at
> > least for HBase and especially, when one deals with S3 or any other
> > cloud storage. S3 read latency can reach seconds sometimes and the
> > only possible mitigation for this huge latency spikes is a
> > very-smart-cache -invalidation-during-compaction algorithm (which does
> not exist yet).
> >
> > For your case, I would recommend the following settings:
> >
> > *CACHE_BLOOM_BLOCKS_ON_WRITE_KEY = true*
> >
> > *CACHE_INDEX_BLOCKS_ON_WRITE_KEY = true*
> >
> > *CACHE_DATA_BLOCKS_ON_WRITE_KEY = false (bad idea to set it to true)*
> >
> >
> >  PREFETCH_BLOCKS_ON_OPEN should be false as well, unless your table is
> > small and your application does this on startup (once)
> >
> >
> > -Vlad
> >
> >
> >
> > On Fri, Sep 20, 2019 at 12:51 PM Jacob LeBlanc <
> > jacob.leblanc@microfocus.com>
> > wrote:
> >
> > > Hi HBase Community!
> > >
> > > I have some questions on block caches around how the prefetch and
> > > cacheblocksonwrite settings work.
> > >
> > > In our production environments we've been having some performance
> > > issues with our HBase deployment (HBase 1.4.9 as part of AWS EMR
> > > 5.22, with data backed by S3).
> > >
> > > Looking into the issue, we've discovered that when regions of a
> > > particular table that are under heavy simultaneous write and read
> > > load go through a big compaction, the rpc handler threads will all
> > > block while servicing read requests to the region that was
> > > compacted. Here are a few relevant lines from a log where you can
> > > see the compaction happen. I've included a couple responseTooSlow
> > > warnings, but there are
> > many more in the log after this:
> > >
> > > 2019-09-16 15:31:10,204 INFO
> > > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > > 118:16020-shortCompactions-1568478085425]
> > > regionserver.HRegion: Starting compaction on a in region
> > >
> >
> block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
> > > 2019-09-16 15:31:10,204 INFO
> > > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > > 118:16020-shortCompactions-1568478085425]
> > > regionserver.HStore: Starting compaction of 8 file(s) in a of
> > >
> >
> block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
> > > into
> > > tmpdir=s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_
> > > v2 /98be887c6f4938e0b492b17c669f3ac7/.tmp,
> > > totalSize=3.0 G
> > > 2019-09-16 15:33:55,572 INFO
> > > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > > 118:16020-shortCompactions-1568478085425]
> > > dispatch.DefaultMultipartUploadDispatcher: Completed multipart
> > > upload of 24 parts 3144722724 bytes
> > > 2019-09-16 15:33:56,017 INFO
> > > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > > 118:16020-shortCompactions-1568478085425]
> > > s3n2.S3NativeFileSystem2: rename
> > > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be
> > > 88
> > > 7c6f4938e0b492b17c669f3ac7/.tmp/eede47d55e06454ca72482ce33529669
> > > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be
> > > 88
> > > 7c6f4938e0b492b17c669f3ac7/a/eede47d55e06454ca72482ce33529669
> > > 2019-09-16 15:34:03,328 WARN
> > > [RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16020]
> ipc.RpcServer:
> > > (responseTooSlow):
> > > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos
> > > $G
> > > etRequest)","starttimems":1568648032777,"responsesize":562,"method":
> > > "G
> > > et","param":"region=
> > > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887
> > > c6
> > > f4938e0b492b17c669f3ac7.,
> > > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14newAcPMKh/dkK2vGxPO1XI
> > > <TRUNCATED>","processingtimems":10551,"client":"172.20.132.45:51168
> > > ","queuetimems":0,"class":"HRegionServer"}
> > > 2019-09-16 15:34:03,750 WARN
> > > [RpcServer.default.FPBQ.Fifo.handler=35,queue=5,port=16020]
> > ipc.RpcServer:
> > > (responseTooSlow):
> > > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos
> > > $G
> > > etRequest)","starttimems":1568648032787,"responsesize":565,"method":
> > > "G
> > > et","param":"region=
> > > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887
> > > c6
> > > f4938e0b492b17c669f3ac7.,
> > > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14nfet675AvHhY4nnKAV2iqu
> > > <TRUNCATED>","processingtimems":10963,"client":"172.20.112.226:52222
> > > ","queuetimems":0,"class":"HRegionServer"}
> > >
> > > Note those log lines are from a "shortCompactions" thread. This also
> > > happens with major compactions, but I understand we can better
> > > control major compactions by running them manually in off hours if we
> choose.
> > >
> > > When this occurs we see the numCallsInGeneralQueue metric spike up,
> > > and some threads in our application that service REST API requests
> > > get tied up which causes some 504 gateway timeouts for end users.
> > >
> > > Thread dumps from the region server show that the rpc handler
> > > threads are blocking on an FSInputStream object (the read method is
> > > synchronized). Here is a pastebin of one such dump:
> > > https://pastebin.com/Mh0JWx3T
> > >
> > > Because we are running in AWS with data backed by S3 and we expect
> > > read latencies to be larger, we are hosting large bucket caches on
> > > the local disk of the region servers. So our understanding is that
> > > after the compaction, the relevant portions of the bucket cache are
> > > invalidated which is causing read requests to have to go to S3, and
> > > these are all trying to use the same input stream and block each
> > > other, and this continues until eventually the cache is populated
> > > enough so that performance returns to normal.
> > >
> > > In an effort to mitigate the effects of compaction on the cache, we
> > > enabled the hbase.rs.cacheblocksonwrite setting on our region servers.
> > > My understanding was that this would be placing the blocks into the
> > > bucketcache while the new hfile was being written. However, after
> > > enabling this setting we are still seeing the same issue occur.
> > > Furthermore, we enabled the PREFETCH_BLOCKS_ON_OPEN setting on the
> > > column family and when we see this issue occur, one of the threads
> > > that is getting blocked from reading is the prefetching thread.
> > >
> > > Here are my questions:
> > > - Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does
> > > it only work for flushing and not for compaction? I can see from the
> > > logs that the file is renamed after being written. Does that have
> > > something to do with why cacheblocksonwrite isn't working?
> > > - Why are the prefetching threads trying to read the same data? I
> > > thought that would only happen when a region is opened and I
> > > confirmed from the master and region server logs that wasn't
> > > happening. Maybe I have a misunderstanding of how/when prefetching
> comes into play?
> > > - Speaking more generally, any other thoughts on how we can avoid
> > > this issue? It seems a shame that we have this nicely populated
> > > bucketcache that is somewhat necessary with a slower file system
> > > (S3), but that the cache suddenly gets invalidated because of
> compactions happening.
> > > I'm wary of turning off compactions altogether during our peak load
> > > hours because I don't want updates to be blocked due to too many
> > > store
> > files.
> > >
> > > Thanks for any help on this!
> > >
> > > --Jacob LeBlanc
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message