hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HBase increment of a specific cell version
Date Tue, 19 Dec 2017 17:52:11 GMT
On Mon, Dec 18, 2017 at 8:17 PM, Alex Loffler <alex@loffler.org> wrote:

> Hi Stack,
>
> Thanks for the response, I am trying to maintain an hourly count of
> messages between two keys/entities: Sender->recipient E.g. a->b
>
> There are multiple ways of modelling this, but one that seems to fit
> nicely is:
> Row key = a
> Col = b
> Timestamp/version= e.g hour-of-day or hour-of-epoch
> Val = count of messages
>
> This approach utilizes the three dimensions of rowkey, col & version
> nicely.
>
> I will never need to look messages up by recipient but will be frequently
> querying for all recipients contacted by a sender (ie. return the
> value(count) for each column (recipient) for a specific rowkey (sender)
> during a particular timespan - ie. at version x)
>
> Everything is in place for this to work except the ability to increment a
> specific version of a cell per the above.
>
> If I don’t keep count (increment) and just write a flag to represent a
> message between the two, this scheme/approach scales really nicely with the
> put version of addColumn
>
> If there’s a better pattern/approach, I’d really appreciate a pointer in
> the right direction
>
> I see. Makes sense. Nice.

You can't use increment as is. Its model is hard-baked doing a read of the
most recent long, an add, and then a write-back of the new long value all
while under an exclusive row lock. You'd need to change Increment so it did
update at explicit version.

The above manner in which we do Increments is 'convenient' but dog slow.
Rather, there should be a means of recording the increment values only --
writes -- and then at read time, an aggregation. Can you cast your model
this way at all?

For now, you could checkAndPut to an explicit coordinate doing read of old
value and writing back the new but this will be a costly op. You could cut
out the client-server round-trips by floating a coprocessor endpoint on the
server that did your increment-at-an-explicit-coordinate but it'd still be
a read-modify-write.

Let us know if we can help in any way Alex,
S








> -Alex.
>
> > On Dec 18, 2017, at 8:49 AM, Stack <stack@duboce.net> wrote:
> >
> > Hello Alex. We don't have such an ability. Can you say what the use case
> is
> > because I at least am having trouble understanding why you would want to
> do
> > such a thing.
> >
> > Thank you,
> > S
> >
> >> On Wed, Dec 13, 2017 at 2:07 PM, Alex Loffler <alex@loffler.org> wrote:
> >>
> >> Hi Folks,
> >>
> >> I am using the HBase’s timestamp/version concept to track
> >> aggregates/counts for time periods/spans.
> >>
> >> The put function allows me to update a specific version, ie.
> >> put(rk).addColumn(cf, column, version, value)
> >>
> >> But I can’t find a way of incrementing a specific version ie.
> >> increment(rk).addColumn(cf, column, version, value) doesn’t exist.
> >>
> >> I can only find increment(rk).addColumn(cf, column, value) which
> exhibits
> >> the default behaviour of taking the latest version of the cell,
> >> incrementing it’s value and updating the timestamp/version with
> >> current-timestamp-millis.
> >>
> >> What I’d really like is an increment to the value in the specified
> >> cell/version without the version update.
> >>
> >> Am I missing something, is this not possible for some reason in not
> >> getting, or would it be a good feature request?
> >>
> >> Thanks again for a fantastic platform!
> >> -Alex.
> >>
> >>
> >>
> >>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message