hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <sa...@pearsonwholesale.com>
Subject Re: MapReduce job to update HBase table in-place
Date Thu, 26 Feb 2009 20:13:10 GMT
as long as the column you are updating is the same as you are reading then 
just using a map will work if the data read will update a different column I 
would use a reduce step to do all the reading first then write the updates.


"Stuart White" <stuart.white1@gmail.com> wrote 
in message 
> I'd like to write a MapReduce job to update an HBase table in-place,
> and I'd like to solicit a little guidance.  Here's what I think I
> should do, as well as some questions.  Any feedback is appreciated.
> - I want to examine all the rows in the table, and for some subset of
> these rows, update some of their column values.
> - I believe I do not need a reduce step, because I have no need for
> intermediate sorting.  I believe only having a Mapper will suffice.
> - I believe I should use a TableInputFormat for my input format.
> - How should I write the changes back to the table?  Should I use a
> TableOutputFormat and, in my Mapper, call .collect() to post my
> updates?  Or should I manually update the table in my Mapper (by using
> HTable.commit(BatchUpdate)) and never call .collect()?  What are the
> pros/cons of these two approaches?
> - Are there any concerns with the fact that I want to update my input
> table in-place during the job?  Do I need to be concerned with any
> sort of "cyclic" problems (somehow the output coming back through the
> job as input later)?  Since I'm only updating records, and not
> creating new ones, I assume this is not a concern.
> Any feedback/thoughts/observations are appreciated!  Thanks!

View raw message