hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Performance test results
Date Mon, 02 May 2011 19:14:46 GMT
It might be the slow memstore issue... after inserting your dataset
issue a flush on your table in the shell, wait a few seconds, then
start reading. Someone else on the mailing list recently saw this type
of issue.

Regarding the block caching logging, here's what I see in my logs:

2011-05-02 10:05:38,718 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
eviction started; Attempting to free 303.77 MB of total=2.52 GB
2011-05-02 10:05:38,751 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: Block cache LRU
eviction completed; freed=303.8 MB, total=2.22 GB, single=755.67 MB,
multi=1.76 GB, memory=0 KB
2011-05-02 10:07:18,737 DEBUG
org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=2.27
GB, free=718.03 MB, max=2.97 GB, blocks=36450, accesses=1056364760,
hits=939002423, hitRatio=88.88%%, cachingAccesses=967172747,
cachingHits=932095548, cachingHitsRatio=96.37%%, evictions=7801,
evicted=35040749, evictedPerRun=4491.8276367187

Keep in mind that currently we don't have like a moving average for
the percentages so at some point those numbers are set in stone...

The handler config is only good if you are using a ton of clients,
which doesn't seem to be the case (at least now).

J-D

On Wed, Apr 27, 2011 at 6:42 AM, Eran Kutner <eran@> wrote:
> I must say the more I play with it the more baffled I am with the
> results. I ran the read test again today after not touching the
> cluster for a couple of days and now I'm getting the same high read
> numbers (10-11K reads/sec per server with some server reaching even
> 15K r/s) if I read 1, 10, 100 or even 1000 rows from every key space,
> however 5000 rows yielded a read rate of only 3K rows per second, even
> after a very long time. Just to be clear I'm always random reading a
> single row in every request, the number of rows I'm talking about are
> the ranges of rows within each key space that I'm randomly selecting
> my keys from.
>
> St.Ack - to answer your questions:
>
> Writing from two machines increased the total number of writes per
> second by about 10%, maybe less. Reads showed 15-20% increase when ran
> from 2 machines.
>
> I already had most of the performance tuning recommendations
> implemented (garbage collection, using the new memory slabs feature,
> using LZO) when I ran my previous test, the only config I didn't have
> is "hbase.regionserver.handler.count", I changed it to 128, or 16
> threads per core, which seems like a reasonable number and tried
> inserting to the same key ranges as before, it didn't seem to have
> made any difference in the total performance.
>
> My keys are about 15 bytes long.
>
> As for caching I can't find those cache hit ratio numbers in my logs,
> do they require a special parameter to enable them? That said, my
> calculations show that the entire data set I'm randomly reading should
> easily fit in the servers memory. Each row has 15 bytes of key + 128
> bytes of data + overhead - let's say 200 bytes. If I'm reading 5000
> rows from each key space and have a total of 100 key spaces that's
> 100*5000*200=100000000B=100MB. This is spread across 5 servers with
> 16GB of RAM, out of which 12.5GB are allocated to the region servers.
>
> -eran

Mime
View raw message