hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <la...@apache.org>
Subject Re: hbase row locking
Date Sun, 28 Sep 2014 20:32:55 GMT
> Are anyone aware of any company who does not
use the hdfs default policy and flush every WAL sync.

It's a trade-off. You'll only lose data when and the wrong three machines die around the same
time (you'd have an outage that any block that exists only on these three boxes). Not also
that time of the data not being on disk is bounded, eventually the OS the flush dirty pages,
Linux does it every 15s by default.

So you'd have all machines die before a single of them manages to flush the dirty pages to
disk. Of course that can happen, for example during a data center power outage.

A while ago I added HDFS-744 to HDFS, but never finished the parts in HBase as nobody (including
myself in the end) was interested in it. Reminds to maybe take this up again in HBase 2.0
since now we support fewer versions of Hadoop.

When HDFS gets tiered storage, we can revive this and put HBase's WAL on SSD storage.

-- Lars

----- Original Message -----
From: abhishek1015 <abhishek1015@gmail.com>
To: user@hbase.apache.org
Sent: Sunday, September 28, 2014 9:13 AM
Subject: Re: hbase row locking

Sorry for confusion. I meant that I am getting 6000 ops/sec throughput
overall using 4 machine. That is 1500 ops/sec/regionserver on average.

I checked the ping response time between machines. It is approximately .09

Assuming that WAL sync thread tries to sync with two other hdfs node
sequentially, the row lock will be held for at least 0.18 ms, which will
still give a very high throughput per regionserver even if only one thread
is working and all other threads are blocked because of locking. 

It appears that bottleneck is then the hdfs disk flush.  And, consequently,
above mentioned schema are equivalent w.r.t. performance.

However, I have a question regarding the default hdfs policy of not flushing
every WAL sync. Are not people in industry afraid of data loss however small
probability of this happening. Are anyone aware of any company who does not
use the hdfs default policy and flush every WAL sync.


View this message in context: http://apache-hbase.679495.n3.nabble.com/hbase-row-locking-tp4064432p4064458.html

Sent from the HBase User mailing list archive at Nabble.com.

View raw message