hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carl Austin <carl.aus...@gmail.com>
Subject Re: Accumulo iterators in HBase
Date Wed, 02 Jul 2014 07:07:51 GMT
Thanks for the time to look and comment and glad it sounds interesting,

The reason I started on this was that I'm using Accumulo and want to make
an application usable on both HBase and Accumulo with the same codebase. I
do a lot of aggregations on data and I feel the Accumulo iterator mechanism
is superior for this use case; it's one of the main reasons I went with
Accumulo and one of the only remaining major differences between the two
applications now that HBase has implemented cell level ACLs.
For example, as I am ingesting a main table of data I am creating many
other question focused tables that keep answers like how many times did I
see combinations of values, when was the last time I saw combinations
together, how many distinct values where in this field for each combination
(using probabilistic counting of course) and many more. All of these things
are well suited to Accumulo iterators for performance at scale because of
how they run at compaction time across key/values that are already being
read at that point, rather than having to update the answers to these
questions on every single insert.

This use case won't be for everyone, but the iterator mechanism is pretty
neat, powerful and a real differentiator in Accumulo (of course there are
many differentiators in HBase too!).



On Tue, Jul 1, 2014 at 6:57 PM, Stack <stack@duboce.net> wrote:

> Interesting project Carl.  Use Cell interface instead of KeyValue if you
> can (especially given you are copying to accumulo key/value).  What you
> thinking? What would be the use case?
> Thanks,
> St.Ack
> On Tue, Jul 1, 2014 at 2:43 AM, Carl Austin <carl.austin@gmail.com> wrote:
> > Hi,
> >
> > I've recently been doing a little research into getting Accumulo
> iterators
> > working in HBase, and in my very basic example I seem to have been able
> to
> > do this for all three scopes (scan, min compaction and major compaction,
> or
> > scan, flush and compaction in HBase terminology).
> >
> > I was hoping that an HBase guru would be able to take a look at my
> approach
> > - https://github.com/carlaustin/hbase-accumulo-iterators. It's very
> > simple,
> > just 7 small classes.
> >
> > I've done it by creating wrappers that can convert from accumulo
> iterators
> > to HBase scanners and back, allowing me to wrap a scanner as an iterator,
> > hand it to an accumulo iterator as the start of an iterator chain, and
> then
> > wrap that back to a scanner and return it. I've then used a
> RegionObserver
> > to implement this on flush, compact and scan.
> >
> > You can see from the example I've done no iterator management or anything
> > at this point, it simply applies an iterator that changes all values to
> the
> > word "carl" for a table called "test". If it looks like this is a go-er
> > then I would look to continue work.
> >
> > I'd really appreciate any comments on the approach to things I've missed,
> > even if they make this a total non-starter.
> >
> > Thanks
> >
> > Carl
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message