From hbase-user-return-4442-apmail-hadoop-hbase-user-archive=hadoop.apache.org@hadoop.apache.org Wed Jun 03 00:02:49 2009 Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 61395 invoked from network); 3 Jun 2009 00:02:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Jun 2009 00:02:49 -0000 Received: (qmail 60781 invoked by uid 500); 3 Jun 2009 00:03:01 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 60749 invoked by uid 500); 3 Jun 2009 00:03:01 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 60739 invoked by uid 99); 3 Jun 2009 00:03:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 00:03:01 +0000 X-ASF-Spam-Status: No, hits=2.4 required=10.0 tests=HTML_MESSAGE,SPF_PASS,URIBL_GREY X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ryanobjc@gmail.com designates 74.125.46.28 as permitted sender) Received: from [74.125.46.28] (HELO yw-out-2324.google.com) (74.125.46.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jun 2009 00:02:50 +0000 Received: by yw-out-2324.google.com with SMTP id 9so4466758ywe.29 for ; Tue, 02 Jun 2009 17:02:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=jxfUBdhUMWQPzmZ1LnIrJTKEoL1umMy5N/YH2fExsuo=; b=x6LPHFfByTm6Wv/c1bhdohnWqWPPnliYPo5FCZk5trK+9GutfUjH9xk+NIU78ziwci nwiuw0+aZQBbPRSBjHKYxbhuOtMZzt6u6zHW6+kKQQ55DBWEc4b+XApWdw+Dv+dmuNU+ 1ftgwj0dvzzavZ5rYLV1oDYRUm/Q2DT6clZ5g= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=cMFtgmpcEKoQ1kB6Ds8ryVEU5CNWo0AzgZUJ3bIJA3OYkufsEhxrpb/G9u6fGPcF6h CS3nbnazX0m/m9JcpOq8Qo46ML8RNMq22kgWFqFroGf4//DNW3s+sqxlCTzvcAaU8KKp qjTFQGnMZLne12C4CQU1p1ErNPSXRvIOW8AiE= MIME-Version: 1.0 Received: by 10.150.225.8 with SMTP id x8mr755409ybg.67.1243987348791; Tue, 02 Jun 2009 17:02:28 -0700 (PDT) In-Reply-To: <29bed2720906021651u773c9e12n5a2c05ffe87c3e69@mail.gmail.com> References: <74f4d40b0906021034l514e6109j61575a1b520fd129@mail.gmail.com> <29bed2720906021117g13ddef5brbd6faee3366d8ac5@mail.gmail.com> <74f4d40b0906021158q557ffabbw3d07ee238eed2c31@mail.gmail.com> <29bed2720906021651u773c9e12n5a2c05ffe87c3e69@mail.gmail.com> Date: Tue, 2 Jun 2009 17:02:28 -0700 Message-ID: <78568af10906021702p48c356b5u2512d7229348580e@mail.gmail.com> Subject: Re: Uses cases for checkAndSave? From: Ryan Rawson To: hbase-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd4d8aa07a3fd046b666091 X-Virus-Checked: Checked by ClamAV on apache.org --000e0cd4d8aa07a3fd046b666091 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit The way I think about checkAndSave might work like this: Takes a Get() object to specify which row and column to affect Takes a Result object to verify said data. This should match the Get() Takes a Put or maybe Delete to apply if the previous two worked. -ryan On Tue, Jun 2, 2009 at 4:51 PM, Guilherme Germoglio wrote: > Hello! > > On Tue, Jun 2, 2009 at 3:58 PM, Erik Holstad > wrote: > > > Hi! > > > > On Tue, Jun 2, 2009 at 11:17 AM, Guilherme Germoglio < > germoglio@gmail.com > > >wrote: > > > > > Hi Erik, > > > > > > For now, I'm using checkAndSave in order to make sure that a row is > only > > > created but not overwritten by multiple threads. So, checkAndSave is > > mostly > > > invoked with a new structure created on the client. Actually, I'm > > checking > > > if a specific "deleted" column in empty. If the "deleted" column is not > > > empty, then the row creation cannot be performed. There are another few > > > tricky cases I'm using it, but I'm sure that making that Result object > > more > > > difficult to create than putting values on a map would be bad for me. > :-) > > > > So you have a row with family and qualifier that you check to see if it > is > > empty > > and if it is you insert a new row? So basically you use it as an atomic > > rowExist > > checker or? Are you usually batching this checks or would it be ok with > > something like: > > > > public boolean checkAndPut(byte[] row, byte[] family, byte[] qualifier, > > byte[] value, Put put){} > > or > > public boolean checkAndPut(KeyValue checkKv, Put put){} > > for now? > > > > Yes. It is ok for me to use the methods above for now. > > Just in case you are curious on how I'll be using them, there are two cases > where I'm using checkAndSave: > > The first is like the atomic rowExist checker and it represents 90% of the > use of checkAndSave. Exactly as you said, I've got a column > attributes:deleted for every row. When creating a new row, the creation > only > happens if this column is empty. When the row creation happens, it is > assigned a 'false' value to this column. When this column receives a 'true' > value, that is, the row is to be deleted, the 'hard' removal (a HTable's > Delete) of the row will be performed asynchronously. Until the 'hard' > removal happens, a software layer that uses HTable will prevent the use of > any 'soft' deleted row by checking the attributes:deleted column. > > The second case of using checkAndSave is to trigger some actions when a > specific column is updated. So, I don't check for emptiness, but if a > previous value continues the same when I'm updating the row. For example, > let's say I have a users table where I will serialize a User object and put > it into a row. Among other things, the User object contains an e-mail > attribute and its change must trigger verification actions, changes on > other > tables, whatever. I realized that performing a get for every User update > just to check whether their e-mail changed or not might not be the better > approach, since changing e-mail is not a very common operation. So, I > thought it is better to checkAndSave an user expecting their current e-mail > value will be the same the one already in the table since this will occur > many many times more than the opposite. However, if it is the case that the > current e-mail value is different from the one in the table, triggers are > fired and then a new update is performed. > > > > > > > > > > > However, here's an idea. What if Put and Delete objects have a field > > > "condition" (maybe, "onlyIf" would be a better name) which is exactly > the > > > map with columns and expected values. So, a given Put or Delete of an > > > updates list will only happen if those expected values match. > > > > > > > Puts and deletes are pretty much just List which is basically a > > List. > > I don't think that we want to add complexity for puts and deletes now > that > > we have worked > > so hard to make it faster and more bare bone. > > > > no problem. (sorry!) > > > > > > > > > Also, maybe it should be possible to indicate common expected values > for > > > all > > > updates of a list too, so a client won't have to put in all updates the > > > same > > > values if needed. But we must remember to solve the conflicts of > expected > > > values. > > > > > Not really sure if you mean that we would check the value of a key before > > inserting the new > > value? That would mean that you would have to do a get for every > put/delete > > which is not > > something we want in the general case. > > > > > > > > > > (By the way, I haven't seen the guts of new Puts and Deletes, so I > don't > > > know how difficult would it be to implement it -- but I can help, if > > > necessary) > > > > > > Thanks, > > > > > > On Tue, Jun 2, 2009 at 2:34 PM, Erik Holstad > > > wrote: > > > > > > > Hi! > > > > I'm working on putting checkAndSave back into 0.20 and just want to > > check > > > > with the people that are using it how they are using it > > > > so that I can make it as good as possible for these users. > > > > > > > > Since the API has changed from earlier versions there are some things > > > that > > > > one need to think about. > > > > For now in the new API there are now Updates, just Put and Delete, so > > for > > > > now I need to know if users used to delete in the old batchUpdate > > > > or just put? > > > > > > > > The new return format Result might seem like a good way to send in > the > > > data > > > > to be used as "actual", but there is no super easy way to build that > > > > on the client side for now, so would be good to know how you are > doing > > > > this. > > > > If you do a get, save the result and then use it for the check or if > > you > > > > just create new structures on the client? > > > > > > > > Regards Erik > > > > > > > > > > > > > > > > -- > > > Guilherme > > > > > > msn: guigermoglio@hotmail.com > > > homepage: http://germoglio.googlepages.com > > > > > > > Regards Erik > > > > > > -- > Guilherme > > msn: guigermoglio@hotmail.com > homepage: http://germoglio.googlepages.com > --000e0cd4d8aa07a3fd046b666091--