hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: Get operation in HBase Map-Reduce methods
Date Tue, 20 Apr 2010 16:36:40 GMT

Going back to the OP's question... using get() within a M/R, the answer is yes.

However you have a problem in that you need to have to somehow determine which row_id you
want to retrieve.

Since you're starting with a list of row_ids, then that should be the source for your m/r.
So you'd have to work out your mapper to take the data from this list as your source and then
within each m/r 's setup(), you connect to HBase to be used in each iteration of map().

I have a process where I scan one column family in a table, and based on information in the
record, I have to perform a get() so what you want to do is possible in a M/R.  

I don't have a good code example for your specific use case. The issue isn't in connecting
to hbase or doing the get. (That's trivial) The hard part is writing a mapper that takes a
list in memory as its input source.

Now here's the point where someone from Cloudera, Yahoo! or somewhere else says that even
that piece is trivial and here's how to do it. :-)

-Mike


> Date: Tue, 20 Apr 2010 10:15:52 +0200
> Subject: Re: Get operation in HBase Map-Reduce methods
> From: jdcryans@apache.org
> To: hbase-user@hadoop.apache.org
> 
> What are the numbers like? Is it 1k rows you need to process? 1M? 10B?
> Your question is more about scaling (or the need to).
> 
> J-D
> 
> On Tue, Apr 20, 2010 at 8:39 AM, Andrey <atimerbaev@gmx.net> wrote:
> > Dear All,
> >
> > Assumed, I've got a list of rowIDs of a HBase table. I want to get each row by
> > its rowID, do some operations with its values, and store the results somewhere
> > subsequently. Is there a good way to do this in a Map-Reduce manner?
> >
> > As far as I understand, a mapper usually takes a Scan to form inputs. It is
> > quite possible to create such a Scan, which contains a lot of RowFilters to be
> > EQUAL to a particular <rowId>. Such a strategy will work for sure, however
is
> > inefficient, since each filter will be tried to match to each found row.
> >
> > So, is there a good Map-Reduce praxis for such kind of situations? (E.g. to make
> > a Get operation inside a map() method.) If yes, could you kindly point to a good
> > code example?
> >
> > Thank you in advance.
> >
> >
 		 	   		  
_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message