hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anders Ossowicki <...@vmn.dk>
Subject Re: ValueFilter finding old versions of cells
Date Tue, 23 Jan 2018 15:33:35 GMT
Hi,

> If i understand your question correctly, if your interested in getting the value FOO
you should change your value filter to as below

Not quite. I want to get no results at all.

I would expect the ValueFilter scan with 'binaryprefix:foo' to not
return any cells, since the cell it found was not within the last N
versions (where N=1 in my example)

As it is now, I can't really trust that the result of a scan with a
valuefilter is representative of the state of the table, so I would
need to verify that none of the returned cells have a more recent
version with a different value. This is the problem I would expect
VERSIONS => 1 to get around.

An yes, a major compaction will clean up the old cells, but that still
gives me a (large) window where I'm getting junk back from a scan.

I'm wondering if there's a way around this, so I can avoid filtering
on clientside. I'd much rather let HBase do that, with its
parallelization.

On 23 January 2018 at 01:00, naresh Goud <nareshgoud.dulam@gmail.com> wrote:
> Hi,
>
> If i understand your question correctly, if your interested in getting the
> value FOO you should change your value filter to as below
>
> scan 't1', { COLUMNS => 'f1:a', FILTER => "ValueFilter( =,
> 'binaryprefix:FOO' )" }  instead of binaryprefix:foo'
>
> If you query after before major compaction then your query with value filter
> don't return any result  binaryprefix:foo'
>
>
>
> Thank you,
> Naresh
>
>
>
>
>
> On Mon, Jan 22, 2018 at 4:57 PM, Anders Ossowicki <and@vmn.dk> wrote:
>>
>> Hi,
>>
>> When doing a scan with a ValueFilter, I get an old cell value out,
>> even with VERSIONS => 1 set for the table.
>>
>> hbase(main):003:0> create 't1', 'f1'
>> 0 row(s) in 1.8020 seconds
>> hbase(main):005:0> put 't1', 'foo', 'f1:a', 'foo'
>> 0 row(s) in 0.1260 seconds
>> hbase(main):006:0> put 't1', 'foo', 'f1:a', 'FOO'
>> 0 row(s) in 0.0070 seconds
>> hbase(main):001:0> scan 't1'
>> ROW                                                   COLUMN+CELL
>>  foo                                                  column=f1:a,
>> timestamp=1516659855024, value=FOO
>> 1 row(s) in 0.2260 seconds
>> hbase(main):002:0> scan 't1', { COLUMNS => 'f1:a', FILTER =>
>> "ValueFilter( =, 'binaryprefix:foo' )" }
>> ROW                                                   COLUMN+CELL
>>  foo                                                  column=f1:a,
>> timestamp=1516659851593, value=foo
>> 1 row(s) in 0.0600 seconds
>> hbase(main):003:0> describe 't1'
>> Table t1 is ENABLED
>> t1
>> COLUMN FAMILIES DESCRIPTION
>> {NAME => 'f1', BLOOMFILTER => 'ROW', VERSIONS => '1', IN_MEMORY =>
>> 'false', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE',
>> TTL => 'FOREVER', COMPRESSION => 'NONE', MIN_VERSIONS => '0',
>> BLOCKCACHE => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '0'}
>>
>> This is on HBase 1.1.2 as shipped by HortonWorks.
>>
>> My understanding is that this will happen as long as there hasn't been
>> a major compaction to clean up old cell versions.
>>
>> I'm wondering if I'm missing an obvious way to get what I want (only
>> cells that would survive a major compaction), possibly one that would
>> just work when VERSIONS => 1, or if I'll just have to do the scan
>> without a valuefilter, and filter the data clientside.
>>
>> --
>> Anders Ossowicki
>
>



-- 
Anders Ossowicki

Mime
View raw message