hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bharath v <bharathvissapragada1...@gmail.com>
Subject Re: Question about MapReduce
Date Mon, 19 Oct 2009 16:57:42 GMT
Kevin : What if i want to implement a Join of 2 tables . Is there an
alternative to TableInputFormat (TIF) because it reads a single table at a
time . I thought of a solution ,but Iam not sure whether it works fine .

Suppose we want to join table1 and table2 and we use TIF on table1 and the
Map phase is as follows .

Map :

Suppose the TIF is reading the region1 of table1. Then we can IN SOME WAY
get the regions start and end keys corresponding to the table2 on that
system (if any) where map is being executed
and read the table2 contents in the Map . This is in some way preserving
DATA LOCALITY..

Is this feasible ? Any comments ?



On Fri, Oct 16, 2009 at 12:09 AM, Kevin Peterson <kpeterson@biz360.com>wrote:

> On Thu, Oct 15, 2009 at 11:30 AM, Something Something <
> luckyguy2050@yahoo.com> wrote:
>
> > 1) I don't think TableInputFormat is useful in this case.  Looks like
> it's
> > used for scanning columns from a single HTable.
> > 2) TableMapReduceUtil - same problem.  Seems like this works with just
> one
> > table.
> > 3) JV recommended NLineInputFormat, but my parameters are not in a file.
> >  They come from multiple files and are in memory.
> >
> > I guess what I am looking for is something like... InMemoryInputFormat...
> > similar to FileInputFormat & DbInputFormat.  There's no such class right
> > now.
> >
> > Worse comes to worst, I can write the parameters into a flat file, and
> use
> > FileInputFormat - but that will slow down this process considerably.  Is
> > there no other way?
> >
> > So you need to pull input from multiple tables at once? Are you expecting
> to do a join on these tables? If you explain what the data looks like, we'd
> understand better. What are your tables, and what would you like to treat
> as
> a single input record?
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message