hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Graham <billgra...@gmail.com>
Subject Re: feature request (count)
Date Fri, 03 Jun 2011 22:50:41 GMT
One alternative option is to calculate some stats during compactions and
store that somewhere for retrieval. The metrics wouldn't be up to date of
course, since they've be stats from the last compaction time. I think that
would still be useful info to have, but it's different than what's being
requested.


On Fri, Jun 3, 2011 at 3:40 PM, Jack Levin <magnito@gmail.com> wrote:

> "Each HFile knows how many KV entries there are in it, but this does
> not map in a general way to the
> number of rows, or the number of rows with a specific column."
>
> It would be nice to have an index like that;  Would solve a lot of
> issues for people migrating from mysql.  I assume that without the
> 'count' feature, people are resorting to storing dataset elements in
> other engines, which is not great, since you then end up to require a
> non-hbase index to be consistent and authoritative for all of your
> datasets that require counts.
>
> -Jack
>
>
> On Fri, Jun 3, 2011 at 3:24 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
> > This is a commonly requested feature, and it remains unimplemented
> > because it is actually quite hard.  Each HFile knows how many KV
> > entries there are in it, but this does not map in a general way to the
> > number of rows, or the number of rows with a specific column. Keeping
> > track of the row count as new rows are created is also not as easy as
> > it seems - this is because a Put does not know if a row already exists
> > or not.  Making it aware of that fact would require doing a get before
> > a put - not cheap.
> >
> > -ryan
> >
> > On Fri, Jun 3, 2011 at 3:20 PM, Jack Levin <magnito@gmail.com> wrote:
> >> I have a feature request:  There should be a native function called
> >> 'count', that produces count of rows based on specific family filter,
> >> that is internal to HBASE and won't be required to read CELLs off the
> >> disk/cache.  Just count up the rows in the most efficient way
> >> possible.  I realize that family definitions are part of the cells, so
> >> it would be nice to have an index that somehow can produce low IO/CPU
> >> hit to hbase when doing a count (for example enabling an index like
> >> that in table schema would be how you turn it on for a specific
> >> family).
> >>
> >> Best,
> >>
> >> -Jack
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message