hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject RE: multiple scanners on same table will cause problem? Scan results change among different tries.
Date Thu, 22 Apr 2010 20:11:42 GMT

Thanks Tim,

I suspect that it should work unless you get so many connections trying to hit the same region
that you overwhelm its ability to handle the scans properly.
(Or there was a problem in the OP's code)

Scans should be 'dirty reads' imho.


-Mike

> Date: Thu, 22 Apr 2010 18:57:01 +0200
> Subject: Re: multiple scanners on same table will cause problem? Scan results 	change
among different tries.
> From: timrobertson100@gmail.com
> To: hbase-user@hadoop.apache.org
> 
> Attached is a quickly hacked test for parallel scanning threads.  You
> might want to increase the amount of data in the test though to test
> properly.
> It seems to pass consistently for me.
> 
> Note it uses a shared HTable object across threads, but the API states:
> "Used to communicate with a single HBase table. This class is not
> thread safe for writes. Gets, puts, and deletes take out a row lock
> for the duration of their operation. Scans (currently) do not respect
> row locking."
> 
> But I am not doing any writes in the test.
> 
> Cheers,
> Tim
> 
> 
> 
> On Thu, Apr 22, 2010 at 4:22 PM, Michael Segel
> <michael_segel@hotmail.com> wrote:
> >
> >
> > Tim,
> >
> > Even without his code, this should be pretty straightforward on how to duplicate.
> >
> > Create the table with a sequence as a column in a column family.
> > Then write a non-m/r job that has multiple threads that connect to
> > HBase and see what they get when they hit the small table in a single region.
> >
> > If you can duplicate the problem, that would be the test code for the jira.
> >
> > -Mike
> >
> >> Date: Thu, 22 Apr 2010 16:13:31 +0200
> >> Subject: Re: multiple scanners on same table will cause problem? Scan results
        change among different tries.
> >> From: timrobertson100@gmail.com
> >> To: hbase-user@hadoop.apache.org
> >>
> >> Could you please post your code that is doing the scanning Steven?
> >>
> >>
> >>
> >> On Thu, Apr 22, 2010 at 3:50 PM, Michael Segel
> >> <michael_segel@hotmail.com> wrote:
> >> >
> >> > Ok...
> >> >
> >> > This is something that I think we'll need input from a major contributor...
> >> >
> >> > It looks like there may be an issue with respect to row locking...
> >> >
> >> > I guess the questions to ask are:
> >> >
> >> > - How does HBase handle row level locking?
> >> > -Concurrent reads/fetches of the same row?
> >> >
> >> > To be honest and fair, HBase is still an immature product when compared
to databases and there going to be some issues that need to be fleshed out.  (Lets see where
we are in 20+ years ;-)
> >> >
> >> > I wish I knew more about the internals of HBase, but there are only so
many hours in the day and my wife forces me to work so I can keep up with her spending. ;-)
(And if any of you happen to ever meet her, please don't bring this up, she'll kill me. :-D
  )
> >> >
> >> > Lets see what St.Ack or Andrew have to say. This might be a JIRA issue.
> >> >
> >> > Thx
> >> >
> >> > -Mike
> >> >
> >> >
> >> >
> >> >> Date: Thu, 22 Apr 2010 20:17:12 +0800
> >> >> Subject: Re: multiple scanners on same table will cause problem? Scan
results         change among different tries.
> >> >> From: steven.zhuang.1984@gmail.com
> >> >> To: hbase-user@hadoop.apache.org
> >> >>
> >> >> hi, Michael,
> >> >>
> >> >>                Sorry for not making the question clear, there are multiple
> >> >> scanners scanning a single table, there might be the case multiple
scanners
> >> >> reading from a single region.
> >> >>        please see answers inline.
> >> >>
> >> >> On Thu, Apr 22, 2010 at 8:08 PM, Michael Segel <michael_segel@hotmail.com>wrote:
> >> >>
> >> >> >
> >> >> > I'm sorry, but are you trying to say that you have multiple scanners
trying
> >> >> > to read from a single region and the result sets do not match?
> >> >> >
> >> >> >  Yes, the result sets do not match.
> >> >>
> >> >> > I guess it would be an easy test, enter a bunch of rows in to
a region and
> >> >> > have a unique integer for each row. (1,2,3,...)
> >> >> > Then run a bunch of unfiltered scans in parallel, and generate
a sum from
> >> >> > the scan. If any of the sums do not match, then you have a potential
issue
> >> >> > on concurency/row locking, and row isolation level.  How does
HBase handle
> >> >> > row level locking and isolation levels?
> >> >> >
> >> >> > I have iterate on the rows/columnfamilies/cells, and printed the
content of
> >> >> each cell, found that there are some cells missing in some scan result
set.
> >> >>
> >> >> > -Mike
> >> >> >
> >> >> > > Date: Thu, 22 Apr 2010 17:07:47 +0800
> >> >> > > Subject: multiple scanners on same table will cause problem?
Scan results
> >> >> >     change among different tries.
> >> >> > > From: steven.zhuang.1984@gmail.com
> >> >> > > To: hbase-user@hadoop.apache.org
> >> >> > >
> >> >> > > hi, All,
> >> >> > >           Has anybody do scan on one table using multiple
scanners at the
> >> >> > > same time and  found some inconsistent problem?
> >> >> > >           I am doing query on a table using dozens(20-120)
of scanners in
> >> >> > > parallel(multiple threads), trying to take advantage of the
multiple
> >> >> > cores.
> >> >> > > But I found the scan results doesn't consist among several
goes. I have
> >> >> > > checked my code, seems there is no bug in it. So I guess
the problem may
> >> >> > > come from the HBase itself.
> >> >> > >           My HBase version is 0.20.3.
> >> >> >
> >> >> > _________________________________________________________________
> >> >> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars
with
> >> >> > Hotmail.
> >> >> >
> >> >> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
> >> >> >
> >> >
> >> > _________________________________________________________________
> >> > Hotmail is redefining busy with tools for the New Busy. Get more from your
inbox.
> >> > http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
> >
> > _________________________________________________________________
> > The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail.
> > http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5
 		 	   		  
_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message