hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Problems with scan after lot of Puts
Date Wed, 30 May 2012 19:05:25 GMT
There you go:

12/05/30 18:54:17 DEBUG client.MetaScanner: Scanning .META. starting
at row=testtable,,00000000000000 for max=10 rows using
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@f593af
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Cached location
for testtable,test_row_0496107,1338404055995.e9c7a4ca97eb2be372445af4d3772031.
is sv4r25s44:62023
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Removed
testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. for
tableName=testtable from cache because of test_row_0012550
12/05/30 18:54:17 DEBUG
client.HConnectionManager$HConnectionImplementation: Cached location
for testtable,,1338404055995.9389fe5538f19a6f2df27e3958dcb434. is
sv4r25s44:62023
12/05/30 18:57:47 INFO hbase.TestPutScan: Run 5 scan
12/05/30 18:57:47 ERROR hbase.TestPutScan: Expected value: value
0000001 0000005, got: value 0496107 0000005

That's a split so the ClientScanner did a reset on the start row. So
I'm going to fix your code and see if I can get anything else.

J-D

On Wed, May 30, 2012 at 11:56 AM, Jean-Daniel Cryans
<jdcryans@apache.org> wrote:
> I'm running it here, but I just remembered about this issue:
>
> "HTable.ClientScanner needs to clone the Scan object"
> https://issues.apache.org/jira/browse/HBASE-4891
>
> And since you are reusing that Scan object, you could definitely hit this issue.
>
> J-D
>
> On Tue, May 29, 2012 at 11:37 PM, Ondřej Stašek
> <ondrej.stasek@firma.seznam.cz> wrote:
>> Here it is:
>>
>> http://pastebin.com/0AgsQjur
>>
>>
>> On 29.5.2012 22:44, Jean-Daniel Cryans wrote:
>>>
>>> Care to share that TestPutScan? Just attach it in a pastebin
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Tue, May 29, 2012 at 6:13 AM, Ondřej Stašek
>>> <ondrej.stasek@firma.seznam.cz>  wrote:
>>>>
>>>> My program writes changes to HBase table by issuing lots of Puts
>>>> (autoCommit
>>>> turned off, flush on end) and afterwards uses ResultScanner on whole
>>>> table
>>>> to read all rows and act upon them. My problem is that on several
>>>> occasions
>>>> scan does not return expected rows. Either scan does not start on the
>>>> beginning of table or somewhere during scan I got old data (not those
>>>> written by Puts before).
>>>>
>>>> I have even written simple test application to simulate this behavior:
>>>> 1. write 1M simple numbered rows to a table
>>>> 2. scan through table to test output, delete every 10th row
>>>> 3. scan again after delete
>>>> 4. repeat until error found
>>>>
>>>> Sample output:
>>>>
>>>> 12/05/29 00:32:12 INFO hbase.TestPutScan: Run 342 put 1000000 rows
>>>> 12/05/29 00:32:35 INFO hbase.TestPutScan: Run 342 scan + del every 10th
>>>> row
>>>> 12/05/29 00:33:29 INFO hbase.TestPutScan: Run 342 scan
>>>> 12/05/29 00:33:29 ERROR hbase.TestPutScan: Expected value: value 0000001
>>>> 0000342, got: value 0281999 0000342
>>>>
>>>> This means, that program expected to get first row, but got 281999th.
>>>>
>>>> This test ran on "minicluster" of 2 regionservers runing Cloudera's
>>>> cdh3u4
>>>> distribution.
>>>>
>>>> Today I got 3 errors like that and from RS's log it seems that in the
>>>> same
>>>> time hbase balancer issued reassign command for this table region (table
>>>> have only 1 region).
>>>>
>>>> Any pointers on what to check or what to send you to help resolve this
>>>> issue?
>>>>
>>>> Regards
>>>>
>>>> Ondrej Stasek
>>>>
>>
>>
>> --
>> Ondřej Stašek
>> Programátor senior
>> Seznam.cz, a.s.
>> Nádražní 159/21
>> 370 01 České Budějovice 6
>>
>> tel.: +420 386 325 467
>> gsm: +420 603 857 602
>> icq: 164660005
>> ondrej.stasek@firma.seznam.cz
>> http://www.seznam.cz
>>

Mime
View raw message