hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "鞠適存" <chihchun....@gmail.com>
Subject data duplicate?
Date Fri, 28 Nov 2008 03:31:48 GMT
Hi,

I revised the sample code "Bulk Import" written by Allen Day to upload a
flat data file to a hbase table.
My table schema is designed as: <row key> <ColFamily1:colKey> <ColFamily2:
colkey>.
The table description found by hbase shell is as follow:
{NAME => 'ATCGeo', IS_ROOT => 'false', IS_META => 'false', FAMILIES =>
[{NAME => 'photo_id', BLOOMFILTER => 'f
alse', VERSIONS => '30000', COMPRESSION => 'NONE', LENGTH => '2147483647',
TTL => '-1', IN_MEMORY => 'true', B
LOCKCACHE => 'true'}, {NAME => 'trail_id', BLOOMFILTER => 'false', VERSIONS
=> '30000', COMPRESSION => 'NONE',
 LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'true', BLOCKCACHE =>
'true'}]}

Some of the data was been found as duplicate-with the same content but the
different timestamp. For example,
I use the: get '<table>', '<rowkey>',{COLUMN=>'col1',VERSION=>30000}
the results are:
timestamp=3090896685592411,
value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg

timestamp=3090896682597411,
value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg

timestamp=3090731558521386,
value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2265.jpg

timestamp=3090731556503386,
value=/media/streetimage/processed/streettester/2008_08_07_12_26_21_C/2264.jpg

I am sure that the data in original file is unique. Could anyone tell me
what's the possible reasons?
Would appreciate any help!

Chu

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message