hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: random timestamp insert
Date Tue, 16 Jun 2009 20:36:53 GMT
The IPC threading can become an issue on a really busy server.  There is by
default 10 IPC listener threads, once you have 10 concurrent operations you
must wait for one to end to do the next one.  You can up this if it ends up
becoming a problem.  It has to be bounded or else resource consumption will
eventually crash.

The only area this becomes a problem is explicit row locking - if you take
out a lock in one client, then a different client comes to get the same
lock, the second client has to wait, and while waiting it consumes a IPC
thread.

But you shouldn't need to use explicit row locking.
- Mutations (puts, deletes) take out a row lock then release it.
- There is a checkAndSave() which allows you to get some kinds of optimistic
concurrency
- you can use the multi-version mechanism to test for optimistic lock
failure
- atomicIncrement allows you to maintain sequences/counters without the use
of locks.

I would recommend from designing a schema/application that uses row locks.
Use one of the other excellent mechanisms provided.  If your needs are
really above and beyond those, lets talk in detail.  A column oriented store
has all sorts of powerful things available to it that rdbms dont have.

On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <alexjaquet@gmail.com>wrote:

> Thanks Ryan for your explanation,
>
> But as I understand IPC call genereate dead lock over consomation  of
> services ? What is the exact role of a region server ?
>
> Thanks again.
>
> 2009/6/16 Ryan Rawson <ryanobjc@gmail.com>
>
> > Hey,
> >
> > So the issue there was when you are using the row-lock support built in,
> > the
> > waiters for a row lock use up a IPC responder thread. There is only so
> many
> > of them. Then your clients start failing as regionservers are busy
> waiting
> > for locks to be released.
> >
> > The suggestion there was to use zookeeper-based locks.  The suggestion is
> > still valid.
> >
> > I don't get your question about if timestamp is better than "Long
> > versioning".  A timestamp is a long - it's default value is
> > System.currentTimeMillis(), thus it's the milliseconds since epoch 1970 -
> a
> > slight variation on the time_t.
> >
> > Generally I would recommend people avoid setting timestamps unless they
> > have
> > special needs.  Timestamps order multiple version for a given row/column,
> > thus if you 'mess it up', you get wrong data returned.
> >
> > I personally believe that timestamps are not necessairly the best way to
> > store time-series data.  While in 0.20 we have better query mechanisms
> (all
> > values between X and Y is the general mechanism), you can probably do
> > better
> > with indexes.
> >
> > -ryan
> >
> > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <alexjaquet@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > I'm also evaluting hbase for some applications and found an old post
> > about
> > > transactions and concurrent access
> > >
> > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
> > >
> > > Does timestamp is really better than Long versioning ?
> > >
> > > Any workaround ?
> > >
> > > 2009/6/16 Xinan Wu <wuxinan@gmail.com>
> > >
> > > > I am aware that inserting data into hbase with random timestamp order
> > > > results indeterminate result.
> > > >
> > > > e.g. comments here
> > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
> > > >
> > > > I've personally experienced indeterminate results before when I
> insert
> > > > in random timestamp order (i.e., multiple versions with same
> timestamp
> > > > in the same cell, out-of-order timestamp when getting multiple
> > > > versions).
> > > >
> > > > In other words, we don't want to go back in time in inserting cells.
> > > > Deletion is ok. But is updating pretty much the same story as
> > > > inserting?
> > > >
> > > > i.e., if I make sure the timestamp does exist in the cell, and then I
> > > > _update_ it with that timestamp (and same value length), sometimes
> > > > hbase still just inserts a new version without touching the old one,
> > > > and of course timestamps of this cell becomes out of order. Even if I
> > > > delete all versions in that cell and reinsert in the time order, the
> > > > result is still out of order. I assume if I do a major compact
> between
> > > > delete all and reinsert, it would be ok, but that's not a good
> > > > solution. Is there any good way to update a version of a cell in the
> > > > past? or that simply won't work?
> > > >
> > > > Thanks,
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message