hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Is the latest version of Hbase support multiple updates on same row at the same time?
Date Fri, 18 Apr 2008 18:51:56 GMT
> -----Original Message-----
> From: news [mailto:news@ger.gmane.org] On Behalf Of Zhou
> Sent: Thursday, April 17, 2008 7:39 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Is the latest version of Hbase support multiple
> updates on same row at the same time?
>
> Jim Kellerman <jim@...> writes:
>
> >
> > I'm not sure what you mean by server,
> > but any particular row is only served
> > by one HBase server. Multiple clients can submit batch
> updates for the
> > same row and they will all be handled by a single HBase server.
> >
>
> When I say server, I actually mean machine.
> There could be multiple clients running on different machines.
> There could be cases that two clients submit batch updates of
> same row to the same HBase server at the same time.
> Then at the HBase Server,
> the batch updates from one client would execute first.
> The other would wait for all of them to be finished, rather
> than return an exception.
> Is that right?

Correct.

> > > Each of them have one BatchUpdate class of their own. I doubt it
> > > would still cause the "update in progress" exception.
> >
> > In 0.16 (and also in the hbase-0.1.x releases) the client
> API supports
> > only one batch update operation at a time. So if a single
> thread did
> > two startUpdate calls or if multiple threads did a single
> startUpdate
> > call, you will get the "update in progress"
> > exception.
> >
> > This has changed in HBase trunk. A single thread or
> multiple threads
> > can create a separate BatchUpdate object for each row they want to
> > update. When all the changes have been added to the
> BatchUpdate, it is
> > sent to the server by calling
> > HTable.commit(BatchUpdate)
> >
>
> I misunderstood the reason of the  "update in progress"
> exception before.
> I thought it does not allow two startUpdate calls on the same
> row simultaneously.
> In fact, as you has explained,
> it does not allow two  startUpdate calls on any rows simultaneously.

Yes. This is a client side problem with 0.16 and 0.1

> >
> > Not sure I understand the problem. The updates collected in a
> > BatchUpdate are sent via a single RPC call. The row gets
> locked on the
> > server and each update is written to the redo log before it
> is cached.
> > When the cache fills it is flushed to disk. If the server crashes
> > before the cache is flushed, the data can be recovered from
> the redo
> > log.
> >
>
> So at the client side, commit operation returns after RPC
> call to server has returned.
> At the time that commit returns, redo logs has already been
> written to the disk.
> Am I right?

Correct

> If that is true, there is no problem of Durability any more.
>
> > > BatchUpdate would not work at lest for massive size of
> data or high
> > > load.
> >
> > Actually it works pretty well. We have several applications
> that have
> > tens of millions of rows on 10 to 20 servers that are
> storing tens of
> > gigabytes of data currently.
> >
> > One user loaded 1.3 billion rows into HBase as a test.
> >
>
> The misunderstood of how BatchUpdate class works direct me to
> that argument. Glad that I'm wrong.
>
> > > I hope HBase could fix the problem in the near future.
> >
> > It is fixed in hbase trunk which has not yet been released.
> >
> > > Is any version of HBase allows concurrent updates while
> what we need
> > > to do is only type table.commit(id)?
> >
> > There is no released version that supports this. It is only
> in hbase
> > trunk which will be released as hbase-0.2.0 in a few weeks.
> >
> > By the way, you know that HBase is now a subproject of
> Hadoop and now
> > has a separate svn repository? All development of hbase-0.1.x and
> > hbase-trunk happens there and not in the hadoop svn. You
> can find the
> > hbase source at:
> >
> > http://svn.apache.org/repos/asf/hadoop/hbase
> >
>
> I am currently doing research on hosting web applications
> data in non-relational DBMS.
> For web application, concurrent access to data happens a lot!
> I really need the concurrent update feature to host a
> scalable web application on HBase.
> I would be looking forward for the release of hbase-0.2.0 Now
> I will try current trunk version first.
>
> Thanks for the explanation.
> It helps me a lot!

No virus found in this outgoing message.
Checked by AVG.
Version: 7.5.524 / Virus Database: 269.23.1/1385 - Release Date: 4/18/2008 9:30 AM


Mime
View raw message