hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guilherme Germoglio <germog...@gmail.com>
Subject Re: Uses cases for checkAndSave?
Date Tue, 02 Jun 2009 23:51:07 GMT
Hello!

On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad <erikholstad@gmail.com> wrote:

> Hi!
>
> On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio <germoglio@gmail.com
> >wrote:
>
> > Hi Erik,
> >
> > For now, I'm using checkAndSave in order to make sure that a row is only
> > created but not overwritten by multiple threads. So, checkAndSave is
> mostly
> > invoked with a new structure created on the client. Actually, I'm
> checking
> > if a specific "deleted" column in empty. If the "deleted" column is not
> > empty, then the row creation cannot be performed. There are another few
> > tricky cases I'm using it, but I'm sure that making that Result object
> more
> > difficult to create than putting values on a map would be bad for me. :-)
>
> So you have a row with family and qualifier that you check to see if it is
> empty
> and if it is you insert a new row? So basically you use it as an atomic
> rowExist
> checker or? Are you usually batching this checks or would it be ok with
> something like:
>
> public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier,
> byte[] value, Put put){}
> or
> public boolean checkAndPut(KeyValue checkKv, Put put){}
> for now?
>

Yes. It is ok for me to use the methods above for now.

Just in case you are curious on how I'll be using them, there are two cases
where I'm using checkAndSave:

The first is like the atomic rowExist checker and it represents 90% of the
use of checkAndSave. Exactly as you said, I've got a column
attributes:deleted for every row. When creating a new row, the creation only
happens if this column is empty. When the row creation happens, it is
assigned a 'false' value to this column. When this column receives a 'true'
value, that is, the row is to be deleted, the 'hard' removal (a HTable's
Delete) of the row will be performed asynchronously. Until the 'hard'
removal happens, a software layer that uses HTable will prevent the use of
any 'soft' deleted row by checking the attributes:deleted column.

The second case of using checkAndSave is to trigger some actions when a
specific column is updated. So, I don't check for emptiness, but if a
previous value continues the same when I'm updating the row. For example,
let's say I have a users table where I will serialize a User object and put
it into a row. Among other things, the User object contains an e-mail
attribute and its change must trigger verification actions, changes on other
tables, whatever. I realized that performing a get for every User update
just to check whether their e-mail changed or not might not be the better
approach, since changing e-mail is not a very common operation. So, I
thought it is better to checkAndSave an user expecting their current e-mail
value will be the same the one already in the table since this will occur
many many times more than the opposite. However, if it is the case that the
current e-mail value is different from the one in the table, triggers are
fired and then a new update is performed.



>
> >
> > However, here's an idea. What if Put and Delete objects have a field
> > "condition" (maybe, "onlyIf" would be a better name) which is exactly the
> > map with columns and expected values. So, a given Put or Delete of an
> > updates list will only happen if those expected values match.
> >
>
> Puts and deletes are pretty much just List<KeyValue> which is basically a
> List<byte[]>.
> I don't think that we want to add complexity for puts and deletes now that
> we have worked
> so hard to make it faster and more bare bone.
>

no problem. (sorry!)


>
>
> > Also, maybe it should be possible to indicate common expected values for
> > all
> > updates of a list too, so a client won't have to put in all updates the
> > same
> > values if needed. But we must remember to solve the conflicts of expected
> > values.
> >
> Not really sure if you mean that we would check the value of a key before
> inserting the new
> value? That would mean that you would have to do a get for every put/delete
> which is not
> something we want in the general case.
>
>
> >
> > (By the way, I haven't seen the guts of new Puts and Deletes, so I don't
> > know how difficult would it be to implement it -- but I can help, if
> > necessary)
> >
> > Thanks,
> >
> > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad <erikholstad@gmail.com>
> > wrote:
> >
> > > Hi!
> > > I'm working on putting checkAndSave back into 0.20 and just want to
> check
> > > with the people that are using it how they are using it
> > > so that I can make it as good as possible for these users.
> > >
> > > Since the API has changed from earlier versions there are some things
> > that
> > > one need to think about.
> > > For now in the new API there are now Updates, just Put and Delete, so
> for
> > > now I need to know if users used to delete in the old batchUpdate
> > > or just put?
> > >
> > > The new return format Result might seem like a good way to send in the
> > data
> > > to be used as "actual", but there is no super easy way to build that
> > > on the client side for now, so would be good to know how you are doing
> > > this.
> > > If you do a get, save the result and then use it for the check or if
> you
> > > just create new structures on the client?
> > >
> > > Regards Erik
> > >
> >
> >
> >
> > --
> > Guilherme
> >
> > msn: guigermoglio@hotmail.com
> > homepage: http://germoglio.googlepages.com
> >
>
> Regards Erik
>



-- 
Guilherme

msn: guigermoglio@hotmail.com
homepage: http://germoglio.googlepages.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message