hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: question about scanning data
Date Mon, 13 Jun 2011 19:30:05 GMT
I think you are confusing a few things, I'll try to clear this up inline.

J-D

On Fri, Jun 10, 2011 at 8:27 PM, Sam Seigal <selekt86@yahoo.com> wrote:
> Hi All,
>
> I had a question about a certain kind of query I would like to do in hbase.
>
> I am storing records in HBase that transition from an initial state "A" to
> an end state "B" .
>
> Initially, the record I will store will look like the following ->
>
> t1 rowid:columnFamily:A <value>
>
> when I get a notification that the state has changed, I will write the
> following value ->
>
> t2:rowid:columnFamily:B <value>
>
> I basically end up with two versions of the same row.

This is not how it works in HBase. Here you end up with a row that has
two columns since they have different qualifiers.

>
> Now, I want to query for all the records that have NOT transitioned to state
> B yet.
>
> Is it possible to express a query in HBase where one can say "retrieve only
> row Id values where there exists a column qualifier A but not B" ?

This is called a secondary index, which HBase doesn't support out of
the box. Google for that and you should see a bunch of discussions on
the subject.

>
> How can I do this ?
>
> I tried doing the following through the hbase shell. I had the following
> values stores:
>
> t1:rowid:cf:A
> t2:rowid:cf:B
>
>
> I did a query for "rowid" with VERSIONS => 1. However, this gives me both A
> and B qualifier values. I am only interested in values that have not yet
> transitioned to B.

Yep, since A and B have one version each they both get returned.

>
> Is there a way to query HBase only for the highest timestamp regardless of
> the value of the column qualifier ? In the above example, the highest
> timestamp for "rowid" is t2 with column qualifier B, but I get t1 and t2
> both back.

You would have to filter the qualifiers yourself, but if you write
multiple times in the same qualifier then it does return the latest
version.

Mime
View raw message