hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: about HBase versions
Date Tue, 24 Nov 2009 22:38:00 GMT
Right now the implementation will return an indeterminate version when
there are duplicates with the same timestamp. If they happen to have
the same value, you are ok.

I think there are a few other gotchas with regards to compaction.
Each timestamp only counts as 1 version, thus you may end up with more
data than you intended depending on how many duplicates you have.

On Tue, Nov 24, 2009 at 2:35 PM, Zhenyu Zhong <zhongresearch@gmail.com> wrote:
> Hi,
>
> I would like to use nice feature of HBase --versions to store a timeseries
> data for a rowkey.
> However, I get duplicates for the same rowkey and the same timestamp if I
> use Put and run mapreduce job multiple times.
>
> For example.
> Put put = new Put(rowkey.getBytes());
> put.add("f1:c1".getBytes(), ts, value.getBytes());
>
> I use TableOutputFormat as the output of the MapReduce job.
> If I run the MapReduce job twice, I would get 2 records with the same rowkey
> and same timestamp.
>
> May I ask whether the Put just adds a row no matter that there is already a
> row with the same key and timestamp in the table?
>
> Best,
> zhenyu
>

Mime
View raw message