hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Question on timestamp, timeranges
Date Mon, 10 Oct 2011 20:51:11 GMT
On Thu, Oct 6, 2011 at 10:10 PM, Steinmaurer Thomas <
Thomas.Steinmaurer@scch.at> wrote:
> > Hard to tell without really knowing what you're trying to do, but my
> default answer is no. If the timestamp is part of your data model, it
> should be inside your row key or a column.
> It's part of our rowkey but due to scalability it's the last part of a
> three-part rowkey. e.g.:
> part1-part2-YYYYMMDDhhmmss
> This is perfect for our ad-hoc queries for part1/part2 for a given day
> via a web-front end.
> But, that we are also trying to do is to process rows either via a
> client or a M/R-job which have been inserted e.g. yesterday for
> calculating daily aggregated values. As our timestamp is at the end of
> the rowkey, we thought about setting the timerange of a scanner object
> as filter criteria when starting the MR-Job. While not perfect it's
> better than doing a full scan of the table I guess.

Ah so this is sort of a secondary index problem...

First thing that comes in mind is if you have a low number of part1 and
part2 combinations, you could do multiple parallel scans on those prefixes.
It could even be done inside MR.

Another option is just to store everything twice, but having the date as the
first component in the row is going to make your life miserable.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message