hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Performance test results
Date Mon, 28 Mar 2011 15:38:35 GMT
On Mon, Mar 28, 2011 at 3:16 AM, Eran Kutner <eran@gigya.com> wrote:
> I started with a basic insert operation. Inserting rows with one
> column with 1KB of data each.
> Initially, when the table was empty I was getting around 300 inserts
> per second with 50 writing threads. Then, when the region split and a
> second server was added the rate suddenly jumped to 3000 inserts/sec
> per server, so ~6000 for the two servers. Over time as more servers
> were added the rate actually went down, and stabilized on around 2000
> inserts/sec per server.

What if you ran your client on more than one server?

An insert is a single 1k cell?

Tell us more about your configs.  Are you using defaults?  If you
watch the logs during your upload, do you see much blocking?

> I also conducted a random column read test, where I read different
> number of columns from randomly selected rows. First I tested reading
> only one specific column (the first in each row). It started at around
> 60r/s  per server and gradually (I assume as more data was loaded into
> the cache)  increased to ~800 r/s per server.

You can check the regionserver log.  It emits a cache stats log line
every so often.  Check cache hit rate percentage.

> When reading 5 random
> columns from each row the rate dropped to around 400 rows/sec and when
> fetching full rows (each with 100 columns) the rate remained about the
> same, at 400 rows/sec per server.

100 columns in a row is 100k, right?

> I'm not sure exactly what should I expect but I was hoping for much
> higher numbers. I read somewhere that for small data it is reasonable
> to expect 10K inserts per core per second. I know 1KB isn't small but
> these are 8 core machines and they are doing about 2K inserts. Also
> the read rate is very low considering all the data should fit in RAM.
> The interesting thing is that there doesn't seem to be any resource
> bottleneck. IO utilization on the servers is negligible and CPU is
> around 40-50% utilization. The client generating the load is not
> loaded either (around 5% CPU utilization). Client network was at 30%
> utilization when reading full rows. So the only reason for flat-lining
> is some sort of lock contention. Does this make sense?

This could be the case.  If you jstack during the reads, what are you
seeing?  Are servers locked up waiting to pass a synchronization point
or waiting on a lock?


View raw message