hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: multiple scanners on same table will cause problem? Scan results change among different tries.
Date Thu, 22 Apr 2010 16:57:01 GMT
Attached is a quickly hacked test for parallel scanning threads.  You
might want to increase the amount of data in the test though to test
properly.
It seems to pass consistently for me.

Note it uses a shared HTable object across threads, but the API states:
"Used to communicate with a single HBase table. This class is not
thread safe for writes. Gets, puts, and deletes take out a row lock
for the duration of their operation. Scans (currently) do not respect
row locking."

But I am not doing any writes in the test.

Cheers,
Tim



On Thu, Apr 22, 2010 at 4:22 PM, Michael Segel
<michael_segel@hotmail.com> wrote:
>
>
> Tim,
>
> Even without his code, this should be pretty straightforward on how to duplicate.
>
> Create the table with a sequence as a column in a column family.
> Then write a non-m/r job that has multiple threads that connect to
> HBase and see what they get when they hit the small table in a single region.
>
> If you can duplicate the problem, that would be the test code for the jira.
>
> -Mike
>
>> Date: Thu, 22 Apr 2010 16:13:31 +0200
>> Subject: Re: multiple scanners on same table will cause problem? Scan results  
      change among different tries.
>> From: timrobertson100@gmail.com
>> To: hbase-user@hadoop.apache.org
>>
>> Could you please post your code that is doing the scanning Steven?
>>
>>
>>
>> On Thu, Apr 22, 2010 at 3:50 PM, Michael Segel
>> <michael_segel@hotmail.com> wrote:
>> >
>> > Ok...
>> >
>> > This is something that I think we'll need input from a major contributor...
>> >
>> > It looks like there may be an issue with respect to row locking...
>> >
>> > I guess the questions to ask are:
>> >
>> > - How does HBase handle row level locking?
>> > -Concurrent reads/fetches of the same row?
>> >
>> > To be honest and fair, HBase is still an immature product when compared to databases
and there going to be some issues that need to be fleshed out.  (Lets see where we are in
20+ years ;-)
>> >
>> > I wish I knew more about the internals of HBase, but there are only so many
hours in the day and my wife forces me to work so I can keep up with her spending. ;-) (And
if any of you happen to ever meet her, please don't bring this up, she'll kill me. :-D  
)
>> >
>> > Lets see what St.Ack or Andrew have to say. This might be a JIRA issue.
>> >
>> > Thx
>> >
>> > -Mike
>> >
>> >
>> >
>> >> Date: Thu, 22 Apr 2010 20:17:12 +0800
>> >> Subject: Re: multiple scanners on same table will cause problem? Scan results
        change among different tries.
>> >> From: steven.zhuang.1984@gmail.com
>> >> To: hbase-user@hadoop.apache.org
>> >>
>> >> hi, Michael,
>> >>
>> >>                Sorry for not making the question clear, there are
multiple
>> >> scanners scanning a single table, there might be the case multiple scanners
>> >> reading from a single region.
>> >>        please see answers inline.
>> >>
>> >> On Thu, Apr 22, 2010 at 8:08 PM, Michael Segel <michael_segel@hotmail.com>wrote:
>> >>
>> >> >
>> >> > I'm sorry, but are you trying to say that you have multiple scanners
trying
>> >> > to read from a single region and the result sets do not match?
>> >> >
>> >> >  Yes, the result sets do not match.
>> >>
>> >> > I guess it would be an easy test, enter a bunch of rows in to a region
and
>> >> > have a unique integer for each row. (1,2,3,...)
>> >> > Then run a bunch of unfiltered scans in parallel, and generate a sum
from
>> >> > the scan. If any of the sums do not match, then you have a potential
issue
>> >> > on concurency/row locking, and row isolation level.  How does HBase
handle
>> >> > row level locking and isolation levels?
>> >> >
>> >> > I have iterate on the rows/columnfamilies/cells, and printed the content
of
>> >> each cell, found that there are some cells missing in some scan result set.
>> >>
>> >> > -Mike
>> >> >
>> >> > > Date: Thu, 22 Apr 2010 17:07:47 +0800
>> >> > > Subject: multiple scanners on same table will cause problem? Scan
results
>> >> >     change among different tries.
>> >> > > From: steven.zhuang.1984@gmail.com
>> >> > > To: hbase-user@hadoop.apache.org
>> >> > >
>> >> > > hi, All,
>> >> > >           Has anybody do scan on one table using multiple
scanners at the
>> >> > > same time and  found some inconsistent problem?
>> >> > >           I am doing query on a table using dozens(20-120)
of scanners in
>> >> > > parallel(multiple threads), trying to take advantage of the multiple
>> >> > cores.
>> >> > > But I found the scan results doesn't consist among several goes.
I have
>> >> > > checked my code, seems there is no bug in it. So I guess the problem
may
>> >> > > come from the HBase itself.
>> >> > >           My HBase version is 0.20.3.
>> >> >
>> >> > _________________________________________________________________
>> >> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars
with
>> >> > Hotmail.
>> >> >
>> >> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
>> >> >
>> >
>> > _________________________________________________________________
>> > Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
>> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
>
> _________________________________________________________________
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Mime
  • Unnamed multipart/mixed (inline, None, 0 bytes)
View raw message