hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase chain MapReduce job with broadcasting smaller tables to all Mappers
Date Thu, 03 Jul 2014 16:27:42 GMT
Did you read the summary object through HTable API in Job #2 ?


On Thu, Jul 3, 2014 at 9:14 AM, Arun Allamsetty <arun.allamsetty@gmail.com>

> Hi,
> I am trying to write a chained MapReduce job on data present in HBase
> tables and need some help with the concept. I am not expecting people to
> provide code by pseudo code for this based on HBase's Java API would be
> nice.
> In a nutshell, what I am trying to do is,
> MapReduce Job 1: Read data from two tables with no common row keys and
> create a summary out of them in the reducer. The output of the reducer is a
> Java Object containing the summary which has been serialized to byte code.
> I store this object in a temporary table in HBase.
> MapReduce Job 2: This is where I am having problems. I now need to read
> this summary object such that it is available in each mapper so that when I
> read data from a third (different) table, I can use this summary object to
> perform more calculations on the data I am reading from the third table.
> I read about distributed cache and tried to implement it, but that doesn't
> seem to work out. I can provide more details in the form of edits if the
> need arises because I don't want to spam this question, right now, with
> details which might be irrelevant.
> Thanks,
> Arun

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message