hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alt Control <altcontrolb...@gmail.com>
Subject Re: How to apply multiple row filters in an efficient way?
Date Wed, 06 Jul 2011 20:59:26 GMT
Thank you St.Ack,

With StartRow I need to pass the full row key, but since my key is made of
date+ticker I can't do that
(I know the desired date, but don't always know the ticker). Is there a way
to do it?

The same thing also apply to the other part of the question - how can I
filter based on the suffix of the key (the ticker)
if not using regex?

Thanks again

On Wed, Jul 6, 2011 at 4:14 PM, Stack <stack@duboce.net> wrote:

> On Tue, Jul 5, 2011 at 1:02 PM, Alt Control <altcontrolblog@gmail.com>
> wrote:
> > Question is - how can I do that efficiently? I don't know if HBase allow
> me
> > to set multiple filters in a single Scane object,
> > but I can do that with regex (for example (GOOG|IBM|DELL|.......|n|)),
> but
> > is this the right way?
> >
>
> You can pass lists of filters.  See
>
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FilterList.html
>
> For scanning during a certain time, make your Scan start (and
> optionally end) within the time you are interested in by passing the
> appropriate start and stop keys:  See setStartRow and setStopRow in
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html.
>
> FYI, avoid regex'es if you can.  They are costly.  HBase is all about
> bytes so to do the check, need to go from bytes to String, then do
> regex, and do this for each compare of all values.  It adds up.
>
> St.Ack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message