hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob LeBlanc <jacob.lebl...@microfocus.com>
Subject RE: Cacheblocksonwrite not working during compaction?
Date Mon, 23 Sep 2019 02:29:32 GMT
My questions were primarily around how cacheblocksonwrite, prefetching, and compaction work
together, which I think is not AWS specific. Although it may be that yes, the 1+ hour prefetching
I am seeing is an AWS-specific phenomenon.

I've looked at the 1.4.9 source a bit more now that I have a better understanding of everything.
As you say cacheDataOnWrite is hardcoded to false for compactions so the hbase.rs.cacheblocksonwrite
setting will have no effect in these cases.

I also now understand that the cache key is partly based on filename, so disabling hbase.rs.evictblocksonclose
isn't going to help for compactions either since the pre-compaction filenames will no longer
be relevant.

Prefetching also makes more sense once I looked at the code. I see now it comes into effect
for HFileReaderV2, so happens on a per-file basis, not per-region. I was confused before why
I was seeing prefetching happen when the region was not opened recently, but now it makes
sense because it is occurring when the compacted file is opened, not the region.

So unfortunately, it looks like I'm sunk in terms of caching data during compaction. Thanks
for the aid in understanding this.

However, I do think this is a valid use case and also seems like it should be fairly easy
to implement with a new cache config setting. On the one hand there is this nice prefetching
feature which is acknowledging the use case for when people want to cache entire tables, and
this use case is more common when considering larger L2 caches. Then on the other hand there
is this hardcoded setting that is assuming nobody would ever want to cache all of the blocks
being written during a compaction which seems at odds with the use case prefetching is trying
to address. Don't get me wrong: I understand that in many use cases caching while writing
during compaction is not desirable in that you don't want to evict blocks that you care about
during the compaction process. In other words it sort of throws a big monkey wrench into the
concept of an LRU cache. I also realize that hbase.rs.cachedataonwrite is geared more towards
flushes for use cases where people often read what was recently written and don't necessarily
want to cache the entire table. But a new config option (call it hbase.rs.cacheblocksoncompaction?)
to address this specific use case would be nice.

I'll plan on opening a JIRA ticket for this and I'd also be happy to take a stab at creating
a patch.

--Jacob LeBlanc

-----Original Message-----
From: Vladimir Rodionov [mailto:vladrodionov@gmail.com] 
Sent: Friday, September 20, 2019 10:29 PM
To: user@hbase.apache.org
Subject: Re: Cacheblocksonwrite not working during compaction?

You are asking questions on Apache HBase user forum, which are more appropriate to ask on
AWS forum, taking into account that you are using Amazon-specific distributive of HBase and
Amazon - specific implementation of  a S3 file system.

As for not working hbase.rs.cacheblocksonwrite, HBase ignores this flag and set it to false
forcefully if file writer is opened by compaction thread (this is true for 2.x, but I am pretty
sure that in 1.x it is the same).

-Vlad

On Fri, Sep 20, 2019 at 4:24 PM Jacob LeBlanc <jacob.leblanc@microfocus.com>
wrote:

> Thank you for the feedback!
>
> Our cache size *is* larger than our data size, at least for our 
> heavily accessed tables. Memory may be prohibitively expensive for 
> keeping large tables in an in-memory cache, but storage is cheap, so 
> hosting a 1 TB bucketcache on the local disk of each of our region 
> servers is feasible and that is what we are trying to accomplish.
>
> I'm not sure I understand the complexity of populating a cache that is 
> supposed to represent the data in files on disk while writing out one 
> of those files during the compaction process. In fact, that's what I 
> understood the hbase.rs.cacheblocksonwrite to do (based on nothing 
> more than the description of the setting in the online hbase book - I 
> don't see very good documentation online for this feature). If that 
> setting doesn't do that, then what does it do exactly? What about the 
> hbase.rs.evictblocksonclose setting? Could that be evicting all of the 
> blocks that are put in the cache at the end of compaction? What are 
> the implications if we set that to "false"?
>
> Prefetching is also OK for us to do on some tables because we are 
> using the on-disk cache (I understand this also means opening a region 
> after a split or move will take longer). But I don't understand why it 
> appeared that prefetching was being done when the region wasn't opened 
> recently. I don't expect prefetching to help us with compactions, but 
> seeing the thread getting blocked after a compaction just raised a red 
> flag that I'm not understanding what is going on.
>
> I understand that some latency during compaction is expected, but what 
> we are seeing is fairly extreme. The instances take thread dumps every 
> 15 minutes and we saw threads still in a BLOCKED state on the same 
> input stream object an hour later! This is after a 3.0 GB compaction 
> was already done. If prefetching was happening, then something seems 
> wrong if it takes an hour to populate 3.0 GB worth of data in a local disk cache from
S3.
>
> I appreciate the help on this!
>
> --Jacob LeBlanc
>
> -----Original Message-----
> From: Vladimir Rodionov [mailto:vladrodionov@gmail.com]
> Sent: Friday, September 20, 2019 6:41 PM
> To: user@hbase.apache.org
> Subject: Re: Cacheblocksonwrite not working during compaction?
>
> >>- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does 
> >>it
> only work for flushing and not for compaction? I can see from the logs 
> that the file is renamed >>after being written. Does that have 
> something to do with why cacheblocksonwrite isn't working?
>
> Generally, it is a very bad idea to enable caching on read/write 
> during compaction unless your cache size is larger than your data size 
> (which is not a common case).
> Cache invalidation during compaction is almost inevitable thing, due 
> to a complexity of a potential optimizations in this case. Actually, 
> there are some works (papers) on the Internet particularly dedicated 
> to a smarter cache invalidation algorithms for a LSM - derived storage 
> engines, but engineers, as usual, much more conservative than academia 
> researches and are not eager to implement novel (not battle tested) algorithms.
> Latency spikes during compaction are normal and inevitable things, at 
> least for HBase and especially, when one deals with S3 or any other 
> cloud storage. S3 read latency can reach seconds sometimes and the 
> only possible mitigation for this huge latency spikes is a 
> very-smart-cache -invalidation-during-compaction algorithm (which does not exist yet).
>
> For your case, I would recommend the following settings:
>
> *CACHE_BLOOM_BLOCKS_ON_WRITE_KEY = true*
>
> *CACHE_INDEX_BLOCKS_ON_WRITE_KEY = true*
>
> *CACHE_DATA_BLOCKS_ON_WRITE_KEY = false (bad idea to set it to true)*
>
>
>  PREFETCH_BLOCKS_ON_OPEN should be false as well, unless your table is 
> small and your application does this on startup (once)
>
>
> -Vlad
>
>
>
> On Fri, Sep 20, 2019 at 12:51 PM Jacob LeBlanc < 
> jacob.leblanc@microfocus.com>
> wrote:
>
> > Hi HBase Community!
> >
> > I have some questions on block caches around how the prefetch and 
> > cacheblocksonwrite settings work.
> >
> > In our production environments we've been having some performance 
> > issues with our HBase deployment (HBase 1.4.9 as part of AWS EMR 
> > 5.22, with data backed by S3).
> >
> > Looking into the issue, we've discovered that when regions of a 
> > particular table that are under heavy simultaneous write and read 
> > load go through a big compaction, the rpc handler threads will all 
> > block while servicing read requests to the region that was 
> > compacted. Here are a few relevant lines from a log where you can 
> > see the compaction happen. I've included a couple responseTooSlow 
> > warnings, but there are
> many more in the log after this:
> >
> > 2019-09-16 15:31:10,204 INFO
> > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > 118:16020-shortCompactions-1568478085425]
> > regionserver.HRegion: Starting compaction on a in region
> >
> block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
> > 2019-09-16 15:31:10,204 INFO
> > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > 118:16020-shortCompactions-1568478085425]
> > regionserver.HStore: Starting compaction of 8 file(s) in a of
> >
> block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
> > into
> > tmpdir=s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_
> > v2 /98be887c6f4938e0b492b17c669f3ac7/.tmp,
> > totalSize=3.0 G
> > 2019-09-16 15:33:55,572 INFO
> > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > 118:16020-shortCompactions-1568478085425]
> > dispatch.DefaultMultipartUploadDispatcher: Completed multipart 
> > upload of 24 parts 3144722724 bytes
> > 2019-09-16 15:33:56,017 INFO
> > [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.
> > 118:16020-shortCompactions-1568478085425]
> > s3n2.S3NativeFileSystem2: rename
> > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be
> > 88
> > 7c6f4938e0b492b17c669f3ac7/.tmp/eede47d55e06454ca72482ce33529669
> > s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be
> > 88
> > 7c6f4938e0b492b17c669f3ac7/a/eede47d55e06454ca72482ce33529669
> > 2019-09-16 15:34:03,328 WARN
> > [RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16020] ipc.RpcServer:
> > (responseTooSlow):
> > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos
> > $G 
> > etRequest)","starttimems":1568648032777,"responsesize":562,"method":
> > "G
> > et","param":"region=
> > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887
> > c6
> > f4938e0b492b17c669f3ac7.,
> > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14newAcPMKh/dkK2vGxPO1XI
> > <TRUNCATED>","processingtimems":10551,"client":"172.20.132.45:51168
> > ","queuetimems":0,"class":"HRegionServer"}
> > 2019-09-16 15:34:03,750 WARN
> > [RpcServer.default.FPBQ.Fifo.handler=35,queue=5,port=16020]
> ipc.RpcServer:
> > (responseTooSlow):
> > {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos
> > $G 
> > etRequest)","starttimems":1568648032787,"responsesize":565,"method":
> > "G
> > et","param":"region=
> > block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887
> > c6
> > f4938e0b492b17c669f3ac7.,
> > row=\\x07\\x84\\x8B>b\\x00\\x00\\x14nfet675AvHhY4nnKAV2iqu
> > <TRUNCATED>","processingtimems":10963,"client":"172.20.112.226:52222
> > ","queuetimems":0,"class":"HRegionServer"}
> >
> > Note those log lines are from a "shortCompactions" thread. This also 
> > happens with major compactions, but I understand we can better 
> > control major compactions by running them manually in off hours if we choose.
> >
> > When this occurs we see the numCallsInGeneralQueue metric spike up, 
> > and some threads in our application that service REST API requests 
> > get tied up which causes some 504 gateway timeouts for end users.
> >
> > Thread dumps from the region server show that the rpc handler 
> > threads are blocking on an FSInputStream object (the read method is 
> > synchronized). Here is a pastebin of one such dump:
> > https://pastebin.com/Mh0JWx3T
> >
> > Because we are running in AWS with data backed by S3 and we expect 
> > read latencies to be larger, we are hosting large bucket caches on 
> > the local disk of the region servers. So our understanding is that 
> > after the compaction, the relevant portions of the bucket cache are 
> > invalidated which is causing read requests to have to go to S3, and 
> > these are all trying to use the same input stream and block each 
> > other, and this continues until eventually the cache is populated 
> > enough so that performance returns to normal.
> >
> > In an effort to mitigate the effects of compaction on the cache, we 
> > enabled the hbase.rs.cacheblocksonwrite setting on our region servers.
> > My understanding was that this would be placing the blocks into the 
> > bucketcache while the new hfile was being written. However, after 
> > enabling this setting we are still seeing the same issue occur.
> > Furthermore, we enabled the PREFETCH_BLOCKS_ON_OPEN setting on the 
> > column family and when we see this issue occur, one of the threads 
> > that is getting blocked from reading is the prefetching thread.
> >
> > Here are my questions:
> > - Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does 
> > it only work for flushing and not for compaction? I can see from the 
> > logs that the file is renamed after being written. Does that have 
> > something to do with why cacheblocksonwrite isn't working?
> > - Why are the prefetching threads trying to read the same data? I 
> > thought that would only happen when a region is opened and I 
> > confirmed from the master and region server logs that wasn't 
> > happening. Maybe I have a misunderstanding of how/when prefetching comes into play?
> > - Speaking more generally, any other thoughts on how we can avoid 
> > this issue? It seems a shame that we have this nicely populated 
> > bucketcache that is somewhat necessary with a slower file system 
> > (S3), but that the cache suddenly gets invalidated because of compactions happening.
> > I'm wary of turning off compactions altogether during our peak load 
> > hours because I don't want updates to be blocked due to too many 
> > store
> files.
> >
> > Thanks for any help on this!
> >
> > --Jacob LeBlanc
> >
>
Mime
View raw message