hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacob LeBlanc <jacob.lebl...@microfocus.com>
Subject Cacheblocksonwrite not working during compaction?
Date Fri, 20 Sep 2019 19:48:08 GMT
Hi HBase Community!

I have some questions on block caches around how the prefetch and cacheblocksonwrite settings
work.

In our production environments we've been having some performance issues with our HBase deployment
(HBase 1.4.9 as part of AWS EMR 5.22, with data backed by S3).

Looking into the issue, we've discovered that when regions of a particular table that are
under heavy simultaneous write and read load go through a big compaction, the rpc handler
threads will all block while servicing read requests to the region that was compacted. Here
are a few relevant lines from a log where you can see the compaction happen. I've included
a couple responseTooSlow warnings, but there are many more in the log after this:

2019-09-16 15:31:10,204 INFO  [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425]
regionserver.HRegion: Starting compaction on a in region block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
2019-09-16 15:31:10,204 INFO  [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425]
regionserver.HStore: Starting compaction of 8 file(s) in a of block_v2,\x07\x84\x8B>b\x00\x00\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.
into tmpdir=s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/.tmp,
totalSize=3.0 G
2019-09-16 15:33:55,572 INFO  [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425]
dispatch.DefaultMultipartUploadDispatcher: Completed multipart upload of 24 parts 3144722724
bytes
2019-09-16 15:33:56,017 INFO  [regionserver/ip-172-20-113-118.us-west-2.compute.internal/172.20.113.118:16020-shortCompactions-1568478085425]
s3n2.S3NativeFileSystem2: rename s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/.tmp/eede47d55e06454ca72482ce33529669
s3://cmx-emr-hbase-us-west-2-oregon/hbase/data/default/block_v2/98be887c6f4938e0b492b17c669f3ac7/a/eede47d55e06454ca72482ce33529669
2019-09-16 15:34:03,328 WARN  [RpcServer.default.FPBQ.Fifo.handler=3,queue=3,port=16020] ipc.RpcServer:
(responseTooSlow): {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest)","starttimems":1568648032777,"responsesize":562,"method":"Get","param":"region=
block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.,
row=\\x07\\x84\\x8B>b\\x00\\x00\\x14newAcPMKh/dkK2vGxPO1XI <TRUNCATED>","processingtimems":10551,"client":"172.20.132.45:51168","queuetimems":0,"class":"HRegionServer"}
2019-09-16 15:34:03,750 WARN  [RpcServer.default.FPBQ.Fifo.handler=35,queue=5,port=16020]
ipc.RpcServer: (responseTooSlow): {"call":"Get(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest)","starttimems":1568648032787,"responsesize":565,"method":"Get","param":"region=
block_v2,\\x07\\x84\\x8B>b\\x00\\x00\\x14mU0p6,1567528560602.98be887c6f4938e0b492b17c669f3ac7.,
row=\\x07\\x84\\x8B>b\\x00\\x00\\x14nfet675AvHhY4nnKAV2iqu <TRUNCATED>","processingtimems":10963,"client":"172.20.112.226:52222","queuetimems":0,"class":"HRegionServer"}

Note those log lines are from a "shortCompactions" thread. This also happens with major compactions,
but I understand we can better control major compactions by running them manually in off hours
if we choose.

When this occurs we see the numCallsInGeneralQueue metric spike up, and some threads in our
application that service REST API requests get tied up which causes some 504 gateway timeouts
for end users.

Thread dumps from the region server show that the rpc handler threads are blocking on an FSInputStream
object (the read method is synchronized). Here is a pastebin of one such dump: https://pastebin.com/Mh0JWx3T

Because we are running in AWS with data backed by S3 and we expect read latencies to be larger,
we are hosting large bucket caches on the local disk of the region servers. So our understanding
is that after the compaction, the relevant portions of the bucket cache are invalidated which
is causing read requests to have to go to S3, and these are all trying to use the same input
stream and block each other, and this continues until eventually the cache is populated enough
so that performance returns to normal.

In an effort to mitigate the effects of compaction on the cache, we enabled the hbase.rs.cacheblocksonwrite
setting on our region servers. My understanding was that this would be placing the blocks
into the bucketcache while the new hfile was being written. However, after enabling this setting
we are still seeing the same issue occur. Furthermore, we enabled the PREFETCH_BLOCKS_ON_OPEN
setting on the column family and when we see this issue occur, one of the threads that is
getting blocked from reading is the prefetching thread.

Here are my questions:
- Why is the hbase.rs.cacheblocksonwrite not seeming to work? Does it only work for flushing
and not for compaction? I can see from the logs that the file is renamed after being written.
Does that have something to do with why cacheblocksonwrite isn't working?
- Why are the prefetching threads trying to read the same data? I thought that would only
happen when a region is opened and I confirmed from the master and region server logs that
wasn't happening. Maybe I have a misunderstanding of how/when prefetching comes into play?
- Speaking more generally, any other thoughts on how we can avoid this issue? It seems a shame
that we have this nicely populated bucketcache that is somewhat necessary with a slower file
system (S3), but that the cache suddenly gets invalidated because of compactions happening.
I'm wary of turning off compactions altogether during our peak load hours because I don't
want updates to be blocked due to too many store files.

Thanks for any help on this!

--Jacob LeBlanc

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message