hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Buttler, David" <buttl...@llnl.gov>
Subject RE: How to efficiently join HBase tables?
Date Fri, 17 Jun 2011 00:02:48 GMT
Depends on a couple of things.  If your LIST is a permanent feature of your document, then
it might make sense to add the list(Boolean? Or the list index if the list has a particular
sort order) to the doc record.  Otherwise, a little simple programming can get you the results
you want:
1) Sort the list (if it is big, then a map reduce job with an identity map / single identity
reducer would do the job).  If you require the order of the list to be maintained then you
need to add another field to the list indicating order, so that you can recover that after
the join.
2) output a list of DOCID / UUID sorted on DOCID
3) use a double iterator through your two outputs to find the UUIDs from the list (and optionally
its order in the list)
4) optionally resort the UUID list by the list order index

This will not be particularly fast, but it should be robust to large list sizes.

If your list can fit into the memory of a map task, then put it in a hash map for each Map
job, and while you iterate over your docs table, you can only output UUIDs and sort order,
and let your reducer reorder them according to your list order.


-----Original Message-----
From: Florin P [mailto:florinpico@yahoo.com] 
Sent: Thursday, June 16, 2011 5:44 AM
To: user@hbase.apache.org
Subject: Re: How to efficiently join HBase tables?

   Regarding the same subject of joining, I have the following scenario:
1. I have a big table DOCS that contains the columns
      sdsd  1
      hdhs  3
      gdhg  7
      shdg  9
and so on (hope you got the idea)
2. an external list of docID 

 upon a I have to query("join") the DOCS DOCID column, so that the result should
be   hdhs, sdsd, gdhg. How I can implement such a request? Can be this a
possible solution:
1. to add a new column LIST (in the same column family ) to the DOCS 
2 add a new record in it that contain my LIST of docID
3. "Join" column LIST with DOCID column? ( perhaps a weird idea)

Thank you.

View raw message