hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: [LIKELY JUNK]Conditional scan by multiple columns
Date Tue, 04 Nov 2008 20:22:41 GMT
As Hui Ding says (smile).  If only 3M records, it might be the way to go 
(If exact matches, add hash of cell contents rather actual content).

You could use hbase filters to filter in the server.  You can't write 
your exact query as a single filter because it looks like you want to 
query across column families and IIRC filters do not run across column 

So, yeah, you would need to run a MR job to either filter full row or 
filter the 'filtered' row.

There are no 'indexes on columns' in hbase.


Ding, Hui wrote:
> If you have an index you probably don't need the mapreduce 
> -----Original Message-----
> From: Mekin Maheshwari [mailto:mekin.m@gmail.com] 
> Sent: Tuesday, November 04, 2008 3:49 AM
> To: hbase-user@hadoop.apache.org
> Subject: [LIKELY JUNK]Conditional scan by multiple columns
> Hi,
> I am a newbie, just got HBase installed and started playing with it.
> I want to perform something akin to :
> select rows where columnFamilyA:columnM = 'X' and  columnFamilyB:columnN
> =
> 'Y'
> >From what I have read, I would probably need to write a MapReduce task
> for
> this, possibly using GroupingTableMap
> Before I embark on doing this, I wanted to understand if:
> 1. Is this is the right way to proceed, or am I missing other simpler
> ways
> of achieving this.
> 2. What would be the performance implications of having queries with
> very
> different column's. Would I need to ensure that I have an index   on all
> columns? What could be the size implications ?
> To give you an idea of the eventual setup I want to be running this on:
> Approx # of rows - 3Million
> Number of column families : 20
> Number of columns would range from 50 to 30,000
> Thanks a ton,
> Mekin

View raw message