hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sergey Sova <sergey.sov...@gmail.com>
Subject Re: Can't improve low random-read speed
Date Fri, 16 Mar 2018 21:28:32 GMT
Finally found 2 stupid issues that caused these problem:
1 - Internet provider speed limit was almost the same that I got in
results. So when I run client code on the server where HBase is located
speed increased twice or so
2 - I found that we did not specify column family in Get objects, so we
always read all data. After CF was set, I got 26489 KB/s, so speed
increased 10 times. I think that's enough for now.

Sorry for bothering

2018-03-16 18:36 GMT+03:00 Sergey Sova <sergey.sova42@gmail.com>:

> Hi. I'm investigating an issue with low random read speed for a few days,
> still stuck. I've already read a few mailing list threads, didn't help,
> though setup is a bit different.
>
> HBase setup:
> 3 nodes, 8GB Heap for Region servers, 1 master
> Table with pre-splitted 120 regions (partitioning), ~ 120M rows.
> Each row has 2CF, one contains data varying 50KB - 4MB, one is smaller,
> maybe around 10KB, rows can differ significantly in size, both have GZ
> compression type, block size 1.3MB and 32KB respectively.
> HDD disks with raid 1, 3TB SATA  6Gb/sec 7200 RPM enterprise
> Network speed between nodes is 1Gbit/sec
>
> Access pattern: get 50 random rows at once.
>
> I run tests in 10 threads from a single application, testing both column
> families, doing in total ~ 200 requests, ~20 per thread.
> Bigger CF takes more time to load, don't see other differences.
>
> Results for big CF:
>
> Loaded data in 102 sec
> Received 341065 KB  (341 MB)
> Bandwidth: 3340 KB/Sec
> Avg time: 7234 ms
>
> If I perform scan (caching=100) over this table from local machine, I get
> these results
> Read 10000 items total
> Finished in 94162 ms
> KBytes read: 199451
> Avg speed: 2117 KB/s
> Not sure if they can help, because it's a different load pattern, but
> maybe it clarifies anything.
> For me, it looks strange that speed is as low as random reads.
>
> What are my thoughts:
> 1. access pattern is not good for HDD - 50 random reads at once, in 10
> threads, but I can't say for sure how bad is it
> 2. row size is rather big. I've read several posts about same problems,
> all had small rows. General advice was to play with block size so HBase
> wouldn't read too much data from disk, I think it's not my case.
>
> Other observations: RegionServer gc logs look OK, RegionServer CPU
> profiling does not show anything strange.
>
> So, can someone give some hints or directions?
> Thanks in advance.
>
> Sergey
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message