hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Problems with write performance (25kb rows)
Date Tue, 05 Jan 2010 14:25:18 GMT
WRT your last 2 emails, HBase ships with defaults that are working
safely for most of the users and in no way tuned for one time upload.
Playing with the memstore size like you did makes sense.

Now you said you were inserting with row key being reversed ts... are
all threads using the same key space when uploading? I ask this
because if all 60 threads are hitting almost always the same region
(different one in time), then all 60 threads are just filling up
really fast the same memstore, then all wait for the snapshot,
eventually all wait for the same region split and in the mean time
fills the same WAL which will probably be rolled some times. Is it the

You could also post a region server log for us to analyze.


On Tue, Jan 5, 2010 at 5:56 AM, Dmitriy Lyfar <dlyfar@gmail.com> wrote:
> Stack,
> I did some tests with different flushing parameters. I've touched following
> params:
> hbase.hregion.memstore.block.multiplier
> hbase.hregion.memstore.flush.size
> When I've increased flushsize to 256Mb (64Mb by default) time of 25Kb test
> grew.
> When I've changed block.multiplier to 12 (was 10) I won about 40 seconds in
> 25Kb test (was 150-160 secs, became ~110 secs for one running instance that
> inserts 100K records).
> All tests I did was with WAL on.
> 2010/1/5 Dmitriy Lyfar <dlyfar@gmail.com>
>> Hello Stack,
>>> > And throughput without WAL is about 50 Mb/sec and  about 15 Mb/sec with
>>> WAL
>>> > on. When I run clients in serial order (i.e. at the moment there is only
>>> > one
>>> > working script) time almost stable and not grows.
>>> >
>>> >
>>> > > See what the
>>> > > numbers are like uploading into a table that is pre-split?
>>> >
>>> >
>>> > Sorry, what you mean pre-split? You mean splitting regions before
>>> running
>>> > script?
>>> >
>>> >
>>> I was thinking you were uploading into a new table and that the region
>>> splits were happening inline with your upload.  I was asking what the
>>> performance was like if the table had already had all its regions pre-made
>>> wondering if it ran faster but sounds like your table is already
>>> pre-split.
>>> So where are we at now?  You tried running multiple separate upload
>>> processes and it still runs too slow?
>> Yes, still too slow, especially with WAL on. Btw, I see the greater row
>> size, the greater impact has WAL. I'm not an expert in hbase internals, but
>> I begin think that the reason of throughput fall in case of 25Kb size
>> connected with flushing. I mean looks like we begin flush too often and it
>> impacts on throughput.
>> Also as I see from architecture description there are could be several
>> reasons, like rolling hlog too often and long compaction period. Would you
>> advice which log messages in region/master logs should warn me that
>> something going wrong?
>> --
>> Regards, Lyfar Dmitriy
> --
> Regards, Lyfar Dmitriy
> mailto: dlyfar@crystalnix.com
> jabber: dlyfar@gmail.com

View raw message