hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Latham <lat...@davelink.net>
Subject Re: Row Filters in TableInputFormatBase
Date Sun, 08 Feb 2009 02:08:08 GMT
I've opened a HBASE-1190 for it.  Looking through the other code, it seems
the pattern is to wrap a StopRowFilter in a WhileMatchRowFilter so that it
will filterAllRemaining once it hits the stop row, so I've submitted a patch
to do that.  It does seem, however, like the StopRowFilter should know to
filterAllRemaining itself once the stop row is reached, and not require a
WhileMatchRowFilter.

Dave

On Sat, Feb 7, 2009 at 1:21 PM, stack <stack@duboce.net> wrote:

> On Wed, Feb 4, 2009 at 4:09 PM, Dave Latham <latham@davelink.net> wrote:
>
> > In order to speed up a map reduce job operating on HBase input data, we
> > recently added a RowFilter to the input format.  However, when trying to
> > execute it, map tasks (one per region) that used to take 1-2 minutes
> began
> > timing out after 10 minutes.  So I dug in to TableInputFormatBase to see
> > how
> > it handles a row filter, and it appears to take out filter and combine it
> > with a StopRowFilter in order to scan the proper split, since there is no
> > getScanner method that can accept both a stop row and a row filter.
> >  Digging
> > further in to the scanning / filtering, it looks like it continues
> scanning
> > filterAllRemaining returns true.  However,
> > StopRowFilter.filterAllRemaining() always returns false.  So if my
> > understanding is correct, every split in this task will end up scanning
> to
> > the end of the table and testing every row with the filter instead of
> > simply
> > stopping at the end of it's given split.  That would explain why my map
> > tasks began taking longer (instead of shorter).
>
>
> > 1. Is my understanding correct?  (aka is this a bug?  If so, I don't see
> an
> > existing JIRA issue for it -- I can open one if no one else does.)
>
>
> Sounds like a bug (and an explanation for long-running jobs) but, IIUC,
> stop
> row filter supposed to have a 'stop row' embedded and once filter passes it
> out, then we stop filltering?  If thats not going on, lets fix it.
>
> St.Ack
> P.S. Thanks for digging in.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message