hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <cedric...@gmail.com>
Subject Re: map reduce range of records from hbase table
Date Thu, 09 Oct 2008 05:50:31 GMT
Thanks for the solutions, I've tried overriding getSplits and it does
what I need.

But for the RowFilter, I guess it would also need to scan through all
records and do filtering. So wouldn't it be the same if I do the
filtering myself during the map phrase?


On Thu, Oct 9, 2008 at 5:13 AM, stack <stack@duboce.net> wrote:
> Cedric Ho wrote:
>> Hi all,
>> I am using 0.18.0 and have successfully used data from hbase table as
>> input to my map/reduce job.
>> I wonder how to specify a subset of records from a table instead of
>> taking all records as input.
>> Such as a range of the row keys or maybe by specific values of certain
>> columns.
> You'll have to subclass the TableInputFormat.
> There is an example in the javadoc on subclassing TIF:
> http://hadoop.apache.org/hbase/docs/r0.18.0/api/org/apache/hadoop/hbase/mapred/TableInputFormatBase.html
> (Sorry, the example is mangled.  Do a get of the html source to see
> non-garbled code).
> The example shows you how to set a filter.  Filters can filter on rows and
> values.
> To work against a subset, you'd probably need to play with getSplits  in
> your subclass.   Default, it  basically eretrns as many splits as there are
> regions in your table, so its the whole table always.  Filters could stop
> unwanted rows being returned but maybe its better if the rows weren't
> considered in the first place; hence the need of getSplits subclassing.
> St.Ack

View raw message