hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Burin des Roziers <eric_...@yahoo.com>
Subject Re: put to WAL and scan/get operation concurrency
Date Thu, 05 May 2011 18:43:40 GMT
Hi Jean-Daniel,

Yes, I need to have a multi-row transactional aware HBase for the types of processing I need
to do.  I need to avoid having partial rows available and I am in the process of selecting
a way to implement such a transaction isolation.  I currently have 2 choices: (1) use the
HBase-trx or (2) implement my own leveraging the verioning that HBase provides.  In light
of this I wanted to understand the inner workings of HBase a little more.  

For example, I want to understand if scans read data from the MemStore even if it has not
yet been flushed to the HFiles yet.  HBase replicates the data 3 times (depending on your
configs).  Does it do that as well for the MemStore.  Say the client wants to inserts 10
lines which happen to fall across 2 regions.  If region 2 fails, then another client will
still be able to read the rows inserted in region 1, but not region 2.  Since HBase replicates
data to other servers, region 2 lines could be available on other servers, right?

The second aspect that I would like to understand is the implementation of the HBase-trx.
 It seems that I can still have a failure point when the transactional WAL (THLog) flushed
the data to the main Wal.  using the above example, I can get into a situation where I will
only be able to read a subset of the initial 10 lines initially inserted.  Is that right?


From: Jean-Daniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Sent: Thursday, May 5, 2011 7:24 PM
Subject: Re: put to WAL and scan/get operation concurrency


On Thu, May 5, 2011 at 6:03 AM, Eric Burin des Roziers
<eric_bdr@yahoo.com> wrote:
> Hi,
> I am currently looking at adding a transactional consistency aspect to HBase and had
2 questions:
> 1. My understanding is that when the client performs an operation (put, delete, incr),
it is sent to the region server which delegates it to different region servers, which in turn
puts it in the WAL and the MemStore in that region.  At some point later, the MemStore is
flushed to disk (into the HFiles).  The WAL is essentially there as a way to recover the
data in case the machine crashes, hence loosing data stored in its MemCache, but not yet store
on disk.  Once the data is available in the MemStore (but not yet in HFiles), do scans and
gets 'see' that data?  Is the data duplicated in the MemStore across 3 region servers?  If
a region server crashes, can I get into a situation where a scan can return a partial data
set without the client being aware of it?

Only one region server serves a region at a time, if that region
server crashes then the data is available on other Datanodes but it's
not available to the client until the WAL is replayed and the region
is reopened. So no stale data.

> 2. The Hbase-trx package implements transactions by effectively creating a WAL per transaction
(THLog) and 'flushing' it to the main WAL (HLog) on commit.  But, flushing this THLog will
take a time window (however small it is).  If a scan (or get) is performed during that window,
could I get into a situation where I see part of the committed transaction (some rows but
not others since they have not been flushed yet)?  Why did the HBase-trx decide to go with
a THLog, instead of leveraging the KeyValue versioning?

I think you are confusing multi-row transactions and single row
transactions. In pure HBase, every single row transaction is ACID. You
can learn more here http://hbase.apache.org/acid-semantics.html

The trx package does multi-row transactions.

> I am thinking of implementing a transaction isolation/consistency mechanism by storing
a unique transaction id as the version when doing a put (instead of the current millis) and
passing invalid transaction ids to scans/get letting them know to fetch a previous version
(with a valid transaction id) for cells that have been updated by a non-committed transaction.
 Are there any reasons for not going with this approach?

So just to be sure, were my previous answers good enough to answer
your question, or are you trying to implement something like the

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message