hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From acure <c...@xg.pl>
Subject Re: Conditional scan by multiple columns
Date Tue, 04 Nov 2008 23:11:41 GMT
Mekin Maheshwari wrote:
> Hi,
> I am a newbie, just got HBase installed and started playing with it.
>
> I want to perform something akin to :
>
> select rows where columnFamilyA:columnM = 'X' and  columnFamilyB:columnN =
> 'Y'
>
> >From what I have read, I would probably need to write a MapReduce task for
> this, possibly using GroupingTableMap
>
> Before I embark on doing this, I wanted to understand if:
> 1. Is this is the right way to proceed, or am I missing other simpler ways
> of achieving this.
> 2. What would be the performance implications of having queries with very
> different column's. Would I need to ensure that I have an index   on all
> columns? What could be the size implications ?
>
> To give you an idea of the eventual setup I want to be running this on:
> Approx # of rows - 3Million
> Number of column families : 20
> Number of columns would range from 50 to 30,00
  Try to use Pigi (http://pigi-project.org), i think, that it should fix 
all your mentioned problems.
  I see problem in number of columns... but are you realy sure that you 
need all combinations of columns?
  Most cases need only few indexes... 
  (Other way - we can say: "hard drives are cheap - let's store all 
combination of all columns!")

      Antony

Mime
View raw message