hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: question about scanning data
Date Mon, 13 Jun 2011 19:30:05 GMT
I think you are confusing a few things, I'll try to clear this up inline.


On Fri, Jun 10, 2011 at 8:27 PM, Sam Seigal <selekt86@yahoo.com> wrote:
> Hi All,
> I had a question about a certain kind of query I would like to do in hbase.
> I am storing records in HBase that transition from an initial state "A" to
> an end state "B" .
> Initially, the record I will store will look like the following ->
> t1 rowid:columnFamily:A <value>
> when I get a notification that the state has changed, I will write the
> following value ->
> t2:rowid:columnFamily:B <value>
> I basically end up with two versions of the same row.

This is not how it works in HBase. Here you end up with a row that has
two columns since they have different qualifiers.

> Now, I want to query for all the records that have NOT transitioned to state
> B yet.
> Is it possible to express a query in HBase where one can say "retrieve only
> row Id values where there exists a column qualifier A but not B" ?

This is called a secondary index, which HBase doesn't support out of
the box. Google for that and you should see a bunch of discussions on
the subject.

> How can I do this ?
> I tried doing the following through the hbase shell. I had the following
> values stores:
> t1:rowid:cf:A
> t2:rowid:cf:B
> I did a query for "rowid" with VERSIONS => 1. However, this gives me both A
> and B qualifier values. I am only interested in values that have not yet
> transitioned to B.

Yep, since A and B have one version each they both get returned.

> Is there a way to query HBase only for the highest timestamp regardless of
> the value of the column qualifier ? In the above example, the highest
> timestamp for "rowid" is t2 with column qualifier B, but I get t1 and t2
> both back.

You would have to filter the qualifiers yourself, but if you write
multiple times in the same qualifier then it does return the latest

View raw message