hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tatsuya Kawano <tatsuy...@snowcocoa.info>
Subject Re: Unique row ID constraint
Date Fri, 30 Apr 2010 16:31:35 GMT
Thanks all for your responses; they are very helpful.

4/30/2010 Todd Lipcon <todd@cloudera.com>:
> Note that your solution is not correct in the case of failure, since the
> check and put are not atomic with each other.
>
> If your client or server fails between the ICV and the put, no other clients
> will be able to put the row, but there will be no data.

I agree; my solution is a bit fragile. If I stick with this plan, I
could try to delete the counter after the put fails. However, it seems
the delete also won't work, because the possible cause of the put
failure can be network disruption or region server problem, etc.)  So,
I'm going to have to leave some kind of failure log, so I can remove
the reserved key later by hand.


4/30/2010 Guilherme Germoglio <germoglio@gmail.com>:
> Can the keys be randomly generated or they must be incremental? Remember
> that you can achieve higher throughput if they are randomly generated since
> the insertions will possibly load all machines more evenly.
>
> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-)
> and load balance over the cluster,

4/30/2010 Michael Segel <michael_segel@hotmail.com>:
> UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash
of the UUID.

Thanks for the info. Well, for my case, I'd like to use a combination
of the business data as the row key, so I can scan them. But, I'll
keep UUID option for other cases.


4/30/2010 Guilherme Germoglio <germoglio@gmail.com>:
> but if you are paranoid enough you can
> also check whether a row already exists by using
> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[],
> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for
> an empty byte array values in a column that you can ensure it has always
> some value).

So, checkAndPut() seems ideal for my case. I didn't realize I can use
it to check whether a row already exists. I'll give it a try!


Thanks,
Tatsuya

-- 
河野 達也
Tatsuya Kawano (Mr.)
Tokyo, Japan

twitter: http://twitter.com/tatsuya6502






2010年4月30日5:09 Michael Segel <michael_segel@hotmail.com>:
>
> UUIDs wont clash. Especially if you're using version 5 which is a truncated SHA-1 hash
of the UUID.
>
>
>> From: germoglio@gmail.com
>> Date: Thu, 29 Apr 2010 13:58:42 -0300
>> Subject: Re: Unique row ID constraint
>> To: hbase-user@hadoop.apache.org
>>
>> Hello Tatsuya,
>>
>> Can the keys be randomly generated or they must be incremental? Remember
>> that you can achieve higher throughput if they are randomly generated since
>> the insertions will possibly load all machines more evenly.
>>
>> Using UUIDs may ensure key uniqueness (I don't hope a UUID clash soon :-)
>> and load balance over the cluster, but if you are paranoid enough you can
>> also check whether a row already exists by using
>> checkAndPut<http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#checkAndPut(byte[],
>> byte[], byte[], byte[], org.apache.hadoop.hbase.client.Put)> (just check for
>> an empty byte array values in a column that you can ensure it has always
>> some value).
>>
>> On Thu, Apr 29, 2010 at 1:36 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>> > Hi Tatsuya,
>> >
>> > Note that your solution is not correct in the case of failure, since the
>> > check and put are not atomic with each other.
>> >
>> > If your client or server fails between the ICV and the put, no other
>> > clients
>> > will be able to put the row, but there will be no data.
>> >
>> > -Todd
>> >
>> >
>> > On Thu, Apr 29, 2010 at 1:33 AM, Tatsuya Kawano <tatsuyaml@snowcocoa.info
>> > >wrote:
>> >
>> > > Hi Stack and Ryan,
>> > >
>> > > Thanks for your advices. I knew using row lock wasn't ideal, but I
>> > > couldn't find an appropriate atomic operation to do Compare And Swap.
>> > >
>> > > So, thanks Stack for helping me to find it. I found
>> > > incrementColumnValue() atomic operation just works for me since it
>> > > automatically initializes the column value with 0 when the column
>> > > doesn't exist. I cat try to increment the column value by 1, and if it
>> > > returns 1, I can be sure that I'm the first one who has created the
>> > > column and row.
>> > >
>> > > So, my updated code is much simpler and now lock-free.
>> > >
>> > > ===============================================
>> > >  def insert(table: HTable, put: Put): Unit = {
>> > >    val count = table.incrementColumnValue(put.getRow, family, uniqueQual,
>> > > 1)
>> > >
>> > >    if (count == 1) {
>> > >      table.put(put)
>> > >
>> > >    } else {
>> > >       throw new DuplicateRowException("Tried to insert a duplicate row:
"
>> > >               + Bytes.toString(put.getRow))
>> > >    }
>> > >  }
>> > > ===============================================
>> > >
>> > > Thanks,
>> > > Tatsuya
>> > >
>> > >
>> > >
>> > > 2010/4/29 Ryan Rawson <ryanobjc@gmail.com>:
>> > > > I would strongly discourage people from building on top of
>> > > > lockRow/unlockRow.  The problem is if a row is not available, lockRow
>> > > > will hold a responder thread and you can end up with a deadlock
>> > > > because the lock holder won't be able to unlock.  Sure the expiry
>> > > > system kicks in, but 60 seconds is kind of infinity in database terms
>> > > > :-)
>> > > >
>> > > > I would probably go with either ICV or CAS to build the tools you
>> > > > want.  With CAS you can accomplish a lot of things locking
>> > > > accomplishes, but more efficiently.
>> > > >
>> > > > On Wed, Apr 28, 2010 at 9:42 AM, Stack <stack@duboce.net> wrote:
>> > > >> Would the incrementValue [1] work for this?
>> > > >> St.Ack
>> > > >>
>> > > >> 1.
>> > >
>> > http://hadoop.apache.org/hbase/docs/r0.20.3/api/org/apache/hadoop/hbase/client/HTable.html#incrementColumnValue%28byte[],%20byte[],%20byte[],%20long%29
>> > > >>
>> > > >> On Wed, Apr 28, 2010 at 7:40 AM, Tatsuya Kawano
>> > > >> <tatsuyaml@snowcocoa.info> wrote:
>> > > >>> Hi,
>> > > >>>
>> > > >>> I'd like to implement unique row ID constraint (like the primary
key
>> > > >>> constraint in RDBMS) in my application framework.
>> > > >>>
>> > > >>> Here is a code fragment from my current implementation (HBase
>> > > >>> 0.20.4rc) written in Scala. It works as expected, but is there
any
>> > > >>> better (shorter) way to do this like checkAndPut()?  I'd like
to pass
>> > > >>> a single Put object to my function (method) rather than passing
>> > rowId,
>> > > >>> family, qualifier and value separately. I can't do this now
because I
>> > > >>> have to give the rowLock object when I instantiate the Put.
>> > > >>>
>> > > >>> ===============================================
>> > > >>> def insert(table: HTable, rowId: Array[Byte], family: Array[Byte],
>> > > >>>                               qualifier: Array[Byte], value:
>> > > >>> Array[Byte]): Unit = {
>> > > >>>
>> > > >>>    val get = new Get(rowId)
>> > > >>>
>> > > >>>    val lock = table.lockRow(rowId) // will expire in one minute
>> > > >>>    try {
>> > > >>>      if (table.exists(get)) {
>> > > >>>        throw new DuplicateRowException("Tried to insert a
duplicate
>> > > row: "
>> > > >>>                + Bytes.toString(rowId))
>> > > >>>
>> > > >>>      } else {
>> > > >>>        val put = new Put(rowId, lock)
>> > > >>>        put.add(family, qualifier, value)
>> > > >>>
>> > > >>>        table.put(put)
>> > > >>>      }
>> > > >>>
>> > > >>>    } finally {
>> > > >>>      table.unlockRow(lock)
>> > > >>>    }
>> > > >>>
>> > > >>> }
>> > > >>> ===============================================
>> > > >>>
>> > > >>> Thanks,
>> > > >>>
>> > > >>> --
>> > > >>> 河野 達也
>> > > >>> Tatsuya Kawano (Mr.)
>> > > >>> Tokyo, Japan
>> > > >>>
>> > > >>> twitter: http://twitter.com/tatsuya6502
>> > >
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>>
>>
>> --
>> Guilherme
>>
>> msn: guigermoglio@hotmail.com
>> homepage: http://sites.google.com/site/germoglio/
>

Mime
View raw message