hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Jaquet <alexjaq...@gmail.com>
Subject Re: random timestamp insert
Date Tue, 16 Jun 2009 20:52:11 GMT
Sorry If we have a conflit .. and not if we are in non optimistic mode

2009/6/16 Alexandre Jaquet <alexjaquet@gmail.com>

>
> checkAndSave have to looks nice but
>
> optimistic concurrency control is based on the assumption that most database
> transactions <http://en.wikipedia.org/wiki/Database_transaction> don't
> conflict with other transactions
>
> In most case but what's happening if we are in a non optimistic mode ?
>
>
>
> 2009/6/16 Ryan Rawson <ryanobjc@gmail.com>
>
>> The IPC threading can become an issue on a really busy server.  There is
>> by
>> default 10 IPC listener threads, once you have 10 concurrent operations
>> you
>> must wait for one to end to do the next one.  You can up this if it ends
>> up
>> becoming a problem.  It has to be bounded or else resource consumption
>> will
>> eventually crash.
>>
>> The only area this becomes a problem is explicit row locking - if you take
>> out a lock in one client, then a different client comes to get the same
>> lock, the second client has to wait, and while waiting it consumes a IPC
>> thread.
>>
>> But you shouldn't need to use explicit row locking.
>> - Mutations (puts, deletes) take out a row lock then release it.
>> - There is a checkAndSave() which allows you to get some kinds of
>> optimistic
>> concurrency
>> - you can use the multi-version mechanism to test for optimistic lock
>> failure
>> - atomicIncrement allows you to maintain sequences/counters without the
>> use
>> of locks.
>>
>> I would recommend from designing a schema/application that uses row locks.
>> Use one of the other excellent mechanisms provided.  If your needs are
>> really above and beyond those, lets talk in detail.  A column oriented
>> store
>> has all sorts of powerful things available to it that rdbms dont have.
>>
>> On Tue, Jun 16, 2009 at 1:22 PM, Alexandre Jaquet <alexjaquet@gmail.com
>> >wrote:
>>
>> > Thanks Ryan for your explanation,
>> >
>> > But as I understand IPC call genereate dead lock over consomation  of
>> > services ? What is the exact role of a region server ?
>> >
>> > Thanks again.
>> >
>> > 2009/6/16 Ryan Rawson <ryanobjc@gmail.com>
>> >
>> > > Hey,
>> > >
>> > > So the issue there was when you are using the row-lock support built
>> in,
>> > > the
>> > > waiters for a row lock use up a IPC responder thread. There is only so
>> > many
>> > > of them. Then your clients start failing as regionservers are busy
>> > waiting
>> > > for locks to be released.
>> > >
>> > > The suggestion there was to use zookeeper-based locks.  The suggestion
>> is
>> > > still valid.
>> > >
>> > > I don't get your question about if timestamp is better than "Long
>> > > versioning".  A timestamp is a long - it's default value is
>> > > System.currentTimeMillis(), thus it's the milliseconds since epoch
>> 1970 -
>> > a
>> > > slight variation on the time_t.
>> > >
>> > > Generally I would recommend people avoid setting timestamps unless
>> they
>> > > have
>> > > special needs.  Timestamps order multiple version for a given
>> row/column,
>> > > thus if you 'mess it up', you get wrong data returned.
>> > >
>> > > I personally believe that timestamps are not necessairly the best way
>> to
>> > > store time-series data.  While in 0.20 we have better query mechanisms
>> > (all
>> > > values between X and Y is the general mechanism), you can probably do
>> > > better
>> > > with indexes.
>> > >
>> > > -ryan
>> > >
>> > > On Tue, Jun 16, 2009 at 1:04 PM, Alexandre Jaquet <
>> alexjaquet@gmail.com
>> > > >wrote:
>> > >
>> > > > Hello,
>> > > >
>> > > > I'm also evaluting hbase for some applications and found an old post
>> > > about
>> > > > transactions and concurrent access
>> > > >
>> > > > http://osdir.com/ml/java.hadoop.hbase.user/2008-05/msg00169.html
>> > > >
>> > > > Does timestamp is really better than Long versioning ?
>> > > >
>> > > > Any workaround ?
>> > > >
>> > > > 2009/6/16 Xinan Wu <wuxinan@gmail.com>
>> > > >
>> > > > > I am aware that inserting data into hbase with random timestamp
>> order
>> > > > > results indeterminate result.
>> > > > >
>> > > > > e.g. comments here
>> > > > > https://issues.apache.org/jira/browse/HBASE-1249#action_12682369
>> > > > >
>> > > > > I've personally experienced indeterminate results before when
I
>> > insert
>> > > > > in random timestamp order (i.e., multiple versions with same
>> > timestamp
>> > > > > in the same cell, out-of-order timestamp when getting multiple
>> > > > > versions).
>> > > > >
>> > > > > In other words, we don't want to go back in time in inserting
>> cells.
>> > > > > Deletion is ok. But is updating pretty much the same story as
>> > > > > inserting?
>> > > > >
>> > > > > i.e., if I make sure the timestamp does exist in the cell, and
>> then I
>> > > > > _update_ it with that timestamp (and same value length), sometimes
>> > > > > hbase still just inserts a new version without touching the old
>> one,
>> > > > > and of course timestamps of this cell becomes out of order. Even
>> if I
>> > > > > delete all versions in that cell and reinsert in the time order,
>> the
>> > > > > result is still out of order. I assume if I do a major compact
>> > between
>> > > > > delete all and reinsert, it would be ok, but that's not a good
>> > > > > solution. Is there any good way to update a version of a cell
in
>> the
>> > > > > past? or that simply won't work?
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message