hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: MapReduce job to update HBase table in-place
Date Wed, 25 Feb 2009 18:31:04 GMT

Currently there are some issues when outputting to a table with scanners
active on it, mainly that the regions won't be able to split until the
scanners are gone.

No, you should not have loop back issues.

My opinion is that you should just do a normal MR with an identity mapper or
reducer. It will be simple and will avoid strange issues although less
efficient. Also be sure to set jobconf.setNumTasksToExecutePerJvm to -1 if
you are using 0.19.0.


On Wed, Feb 25, 2009 at 12:08 PM, Stuart White <stuart.white1@gmail.com>wrote:

> I'd like to write a MapReduce job to update an HBase table in-place,
> and I'd like to solicit a little guidance.  Here's what I think I
> should do, as well as some questions.  Any feedback is appreciated.
> - I want to examine all the rows in the table, and for some subset of
> these rows, update some of their column values.
> - I believe I do not need a reduce step, because I have no need for
> intermediate sorting.  I believe only having a Mapper will suffice.
> - I believe I should use a TableInputFormat for my input format.
> - How should I write the changes back to the table?  Should I use a
> TableOutputFormat and, in my Mapper, call .collect() to post my
> updates?  Or should I manually update the table in my Mapper (by using
> HTable.commit(BatchUpdate)) and never call .collect()?  What are the
> pros/cons of these two approaches?
> - Are there any concerns with the fact that I want to update my input
> table in-place during the job?  Do I need to be concerned with any
> sort of "cyclic" problems (somehow the output coming back through the
> job as input later)?  Since I'm only updating records, and not
> creating new ones, I assume this is not a concern.
> Any feedback/thoughts/observations are appreciated!  Thanks!

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message