hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Remove the row in MR job?
Date Fri, 12 Oct 2012 18:41:30 GMT

I'm not entirely sure of the use-case, but here are some thoughts on thisÅ 

re:  "should I take the table from the pool, and simply call the delete

Yep, you can construct an HTable instance within a MR job.  But use the
delete that takes a list because the single-delete will invoke an RPC for
each one (not great over an MR job).

Construct the HTable instance at the Mapper level (not map-method level)
and keep a buffer of deletes in a List.  At the end of the job, send any
un-processed deletes in the cleanup method.

I'm not entirely sure why you'd want to delete every row in a table (as
opposed to processing all the rows in Table1 and generating an entirely
new Table2).  And then drop Table1 when you're done with it.  That seems
like it would be less hassle than deleting every row (since the table is
empty anyway).

On 10/12/12 1:20 PM, "Jean-Marc Spaggiari" <jean-marc@spaggiari.org> wrote:

>I have a table which I want to parse over a MR job.
>Today, I'm using a scan to parse all the rows. Each row is retrieve,
>removed, and the parsed (feeding 2 other tables)
>The goal is to parse all the content while some process might still be
>adding some more.
>On the map method from the MR job, can I delete the row I'm working
>with? If so, how should I do? should I take the table from the pool,
>and simply call the delete method? The issue is, doing a delete for
>each line will take a while. I would prefer to batch them, but I don't
>know when will be the last line, so it's difficult to know when to
>send the batch.  Is there a way to say to the MR job to delete this
>line? Also, what's the impact on the MR job if I delete the row it's
>working one?
>Or is the MR job not the best way to do that?

View raw message