hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Dorner <christopher.dor...@gmail.com>
Subject Reduce-side-join, input from hbase and hdfs
Date Sun, 16 Oct 2011 09:48:21 GMT
Hi,

I am considering doing Reduce-Side-Joins, where one input would be read 
from HDFS and another one from a HBase Table.

is it somehow possible to use

TableMapReduceUtil.initTableMapperJob(table, scan, Mapper_HBase.class, 
..., job);

and

MultipleInputs(job, path, ..., Mapper_HDFS.class)

in the same time for one job?
It seems, MultipleInputs(...) gets the priority when i tried to use 
both. The Mapper_HBase was not executed. It executes, when i remove the 
MultipleInputs.


And is there something equivalent to MultipleInputs() for HBase Tables? 
e.g. MultipleTableInputs()? I saw there was a request here
https://issues.apache.org/jira/browse/HBASE-2965


A workaround would be to write the Scan Results to HDFS first and do the 
reduce-side join by using MultipleInputs. But i wanted to avoid this 
additional I/O overhead.

Thanks,
Christopher




Mime
View raw message