hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Gray <jg...@facebook.com>
Subject RE: HTable checkAndPut equivalent for Deletes
Date Fri, 30 Apr 2010 21:58:05 GMT
One option would be to just do the delete.  Deletes are cheap and nothing bad will happen if
you delete data which doesn't exist (unless you do the delete latest version which does require
a value to exist).

> -----Original Message-----
> From: Michael Dalton [mailto:mwdalton@gmail.com]
> Sent: Friday, April 30, 2010 2:51 PM
> To: hbase-user@hadoop.apache.org
> Subject: HTable checkAndPut equivalent for Deletes
> 
> Hi everyone,
> 
> I have a quick question -- I'd like to do a simple atomic check-and-
> Delete
> for a row. For Put operations, HTable.checkAndPut appears to allow a
> simple
> atomic compare-and-update, which is great. However, there doesn't seem
> to be
> an equivalent function for deletes.
> 
> I was thinking about approximating this by writing NULL or zero-length
> byte
> array as a value in a Put to emulating deleting a cell. It appears that
> checkAndPut already treats a zero-length array as equivalent to a
> non-existent value when performing its comparison (before committing
> the
> Put). The only drawback I can see to this is that I never truly remove
> rows,
> I just end up with 'dead' rows containing empty byte arrays, so I'd
> imagine
> that every N hours or days I would need to garbage collect these empty
> rows
> somehow (which brings us back full circle to the issue of how to
> atomically
> check and delete a row).
> 
> The only real alternative I can see for doing this would be to emulate
> checkAndDelete by using RowLocks to lock the row, perform a Get, verify
> that
> the row contains the expected value, then perform a delete, and then
> unlock
> the row itself. Correct me if I'm wrong, but this should definitely
> emulate
> the semantics of atomic compare-and-Delete (assuming the compare and
> delete
> operate on the same row and use the RowLock). However, I'm not sure
> what the
> performance would be for using RowLocks to emulate checkAndDelete on
> the
> client side vs. using Put+checkAndPut to emulate checkAndDelete on the
> server side. Does anyone have any advice on this issue, or any idea
> what the
> relative tradeoffs are?
> 
> In the long run, it seems to me that the clearly optimal solution would
> be
> to have a checkAndDelete function in HTable, and I'd be interesting in
> adding this functionality if no one else is currently working on it. Is
> that
> something that would be interesting to integrate and worth committing
> back
> to mainline? Are there any hidden pitfalls I should be aware of, or
> some
> technical/design reason for why this API call doesn't already exist? If
> not,
> I'll take a hard look at the delete and checkAndPut code in the
> regionserver
> and once sometime soon open an issue in JIRA and start coding.
> 
> Best regards,
> 
> Mike

Mime
View raw message