hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toby White <toby.o.h.wh...@googlemail.com>
Subject Re: duplicated hbase timestamps
Date Fri, 12 Dec 2008 00:39:38 GMT
Sorry for the very slow response - local priorities changed and I  
didn't have a chance to respond properly before.

The issue described previously is still occurring (brief recap - hbase  
is reporting cells with duplicate timestamps, see the quoted output  
below.)

I originally saw this with 0.18.0 - I've now checked, and I still see  
it with 0.18.1 (both on hadoop 0.18.1) and current trunk:r725828 (with  
hdfs upgraded to run on hadoop 0.19.0)

This is running in pseudo-distributed mode.

I've been able to narrow down the trigger a bit. I can't cause it to  
happen entirely reproducibly, but it seems
to occur only when I've done the following:

* Create a row;
* Add lots of data at different timestamps into one column (thrift  
mutateRowTs or shell put)
* Delete all data in that column, or indeed the entire row (thrift  
deleteAll or deleteAllRow or shell deleteall)
* at this point, hbase reports that the row has indeed been removed.
* Recreate the row, and put data back into the same column, at the  
same timestamps, but with potentially different values (thrift  
mutateRowTs / shell put)
* On reading the row, Hbase seems to see and report back both the  
newly-added data, and the data previously deleted (thrift getVer /  
shell get)

On a row where this has happened once, it seems to happen almost  
always thereafter, each time appending a whole new set of data. So, if  
you're adding/removing 100 cells at a time, then the total number of  
cells hbase reports back will grow by 100 every time you repeat the  
cycle.

On a row where it hasn't happened yet, the delete behaviour seems  
usually correct.

The problem is observable working either through the Python thrift  
interface, or directly through the Hbase JRuby shell.

The HDFS filesystem on which I'm observing this is fairly small -  
under 100Mb compressed - I can forward it for debugging off list if  
that's helpful. I'd be grateful for any help sorting this out.

Toby

On 20 Oct 2008, at 17:51, Jean-Daniel Cryans wrote:

> Toby,
>
> Can you tell us more about your setup? Numbers of machines, if NTP is
> installed and running, number of regions in your table and other  
> useful
> stuff.
>
> Thx,
>
> J-D
>
> On Mon, Oct 20, 2008 at 11:11 AM, Toby White
> <toby.o.h.white@googlemail.com>wrote:
>
>> I'm seeing a strange effect on my hbase instance. Sometimes, on  
>> requesting
>> the full history of a column, I get back individual cells several  
>> times
>> over.
>>
>> That is, I'm getting results like this:
>>
>> base(main):006:0* get 'my_table', 'scw9npU7Q4ma_khXqlDGXg', {COLUMN  
>> =>
>> 'value:', VERSIONS=>4000}
>> timestamp=1224504133000, value=1013.0
>> timestamp=1224502749000, value=1012.0
>> timestamp=1224502749000, value=1012.0
>> timestamp=1224499880000, value=1011.0
>> timestamp=1224499880000, value=1011.0
>> timestamp=1224499880000, value=1011.0
>> timestamp=1224415961000, value=1010.0
>> timestamp=1224415961000, value=1010.0
>> timestamp=1224415961000, value=1010.0
>> timestamp=1224415701000, value=1009.0
>> timestamp=1224415701000, value=1009.0
>> timestamp=1224415701000, value=1009.0
>> timestamp=1224414200000, value=1008.0
>> timestamp=1224414200000, value=1008.0
>> timestamp=1224414200000, value=1008.0
>>
>> This happens both through the hbase shell as shown here, and when
>> communicating with the server via thrift.
>>
>> In either case, the cells are reported either as shown above; that  
>> is, with
>> each cell simply repeated several times (in this case, 3) or  
>> sometimes with
>> the series repeated; something like this:
>>
>> base(main):006:0* get 'golddigger', 'scw9npU7Q4ma_khXqlDGXg',  
>> {COLUMN =>
>> 'value:', VERSIONS=>4000}
>> timestamp=1224504133000, value=1013.0
>> timestamp=1224502749000, value=1012.0
>> timestamp=1224499880000, value=1011.0
>> timestamp=1224415961000, value=1010.0
>> timestamp=1224415701000, value=1009.0
>> timestamp=1224414200000, value=1008.0
>> timestamp=1224504133000, value=1013.0
>> timestamp=1224502749000, value=1012.0
>> timestamp=1224499880000, value=1011.0
>> timestamp=1224415961000, value=1010.0
>> timestamp=1224415701000, value=1009.0
>> timestamp=1224414200000, value=1008.0
>>
>> or sometimes a combination of both ie an entire series, each cell  
>> repeated
>> a couple of times, and then the whole lot repeated again.
>>
>> This doesn't happen with all rows, only some of them, apparently at  
>> random.
>> Sometimes, restarting hbase & the underlying hdf makes the problem  
>> go away;
>> sometimes, it doesn't, and the issue persists.
>>
>> This is with hbase 0.18.0 on hadoop 0.18.1
>>
>> Is this a known issue?
>>


Mime
View raw message