hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart White <stuart.whi...@gmail.com>
Subject MapReduce job to update HBase table in-place
Date Wed, 25 Feb 2009 17:08:32 GMT
I'd like to write a MapReduce job to update an HBase table in-place,
and I'd like to solicit a little guidance.  Here's what I think I
should do, as well as some questions.  Any feedback is appreciated.

- I want to examine all the rows in the table, and for some subset of
these rows, update some of their column values.
- I believe I do not need a reduce step, because I have no need for
intermediate sorting.  I believe only having a Mapper will suffice.
- I believe I should use a TableInputFormat for my input format.
- How should I write the changes back to the table?  Should I use a
TableOutputFormat and, in my Mapper, call .collect() to post my
updates?  Or should I manually update the table in my Mapper (by using
HTable.commit(BatchUpdate)) and never call .collect()?  What are the
pros/cons of these two approaches?
- Are there any concerns with the fact that I want to update my input
table in-place during the job?  Do I need to be concerned with any
sort of "cyclic" problems (somehow the output coming back through the
job as input later)?  Since I'm only updating records, and not
creating new ones, I assume this is not a concern.

Any feedback/thoughts/observations are appreciated!  Thanks!

View raw message