hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: Row level locking?
Date Fri, 16 Jul 2010 20:24:04 GMT
Explicit locks with zookeeper would be (a) slow and (b) completely out
of band and ultimately up to you.  I wouldn't exactly be eager to do
our row locking in zookeeper (since the minimum operation time is
between 2-10ms).

You could do application advisory locks, but that is true no matter
what datastore you use...

On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
<germoglio@gmail.com> wrote:
> What about implementing explicit row locks using the zookeeper? I'm planning
> to do this sometime in the near future. Does anyone have any comments
> against this approach?
>
> (or maybe it was already implemented by someone :-)
>
> On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson <ryanobjc@gmail.com> wrote:
>
>> HTable.close does very little:
>>
>>  public void close() throws IOException{
>>    flushCommits();
>>  }
>>
>>
>> None of which involves row locks.
>>
>> One thing to watch out for is to remember to close your scanners -
>> they continue to use server-side resources until you close them or 60
>> seconds passes and they get timed out.  Also be very wary of using any
>> of the explicit row locking calls, they are generally trouble for more
>> or less everyone.  There was a proposal to remove them, but I don't
>> think that went through.
>>
>>
>> On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene <clehene@adobe.com> wrote:
>> >
>> > On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
>> >
>> >
>> >
>> > Thanks for the response.
>> > (You don't need to include the cc ...)
>> >
>> > With respect to the row level locking ...
>> > I was interested in when the lock is actually acquired, how long the lock
>> persists and when is the lock released.
>> > From your response, the lock is only held on updating the row, and while
>> the data is being written to the memory cache which is then written to disk.
>> (Note: This row level locking different than transactional row level
>> locking.)
>> >
>> > Now that I've had some caffeine I think I can clarify... :-)
>> >
>> > Some of my developers complained that they were having trouble with two
>> different processes trying to update the same table.
>> > Not sure why they were having the problem, so I wanted to have a good
>> fix. The simple fix was to have them issue the close() the HTable connection
>> which forces any resources that they acquired to be released.
>> >
>> >
>> > It would help to know what the exact problem was. Normally I wouldn't see
>> any problems.
>> >
>> >
>> > In looking at the problem... its possible that they didn't have AutoFlush
>> set to true so the write was still in the buffer and hadn't gotten flushed.
>> >
>> > If the lock only persists for the duration of the write to memory and is
>> then released, then the issue could have been that the record written was in
>> the buffer and not yet flushed to disk.
>> >
>> >
>> > At the region server level HBase will use the cache for both reads and
>> writes. This happens transparently for the user. Once something is written
>> in the cache, all other clients will read from the same cache. No need to
>> worry if the cache has been flushed.
>> > Lars George has a good article about the hbase storage architecture
>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>> >
>> > I'm also assuming that when you run a scan() against a region that any
>> information written to buffer but not yet written to disk will be missed.
>> >
>> >
>> > When you do puts into hbase you'll use HTable. The HTable instance is on
>> the client.  HTable keeps a buffer as well and if autoFlush is false it only
>> flushes when you do flushCommits() or when it reaches the buffer limit, or
>> when you close the table. With autoFlush set to true it will flush for every
>> put.
>> > This buffer is on the client. So when data is actually flushed it gets on
>> the region server where it will get in the region server cache and WAL.
>> > Unless a client flushes the put no other client can see the data because
>> it still resides on the client only. Depending on what you need to do you
>> can use autoFlush true if you are doing many small writes that need to be
>> seen immediately by others. You can use autoFlush false and issue
>> flushCommits() yourself, or you can rely on the buffer limit for that.
>> >
>> > So I guess the question isn't so much the issue of a lock, but that we
>> need to make sure that data written to the buffer should be flushed ASAP
>> unless we know that we're going to be writing a lot of data in the m/r job.
>> >
>> >
>> > Usually when you write from the reducer (heavy) is better to use a buffer
>> and not autoFlush to have a good performance.
>> >
>> > Cosmin
>> >
>> >
>> > Thx
>> >
>> > -Mike
>> >
>> >
>> >
>> > From: clehene@adobe.com<mailto:clehene@adobe.com>
>> > To: user@hbase.apache.org<mailto:user@hbase.apache.org>
>> > CC: hbase-user@hadoop.apache.org<mailto:hbase-user@hadoop.apache.org>
>> > Date: Fri, 16 Jul 2010 12:34:36 +0100
>> > Subject: Re: Row level locking?
>> >
>> > Currently a row is part of a region and there's a single region server
>> serving that region at a particular moment.
>> > So when that row is updated a lock is acquired for that row until the
>> actual data is updated in memory (note that a put will be written to cache
>> on the region server and also persisted in the write-ahead log - WAL).
>> Subsequent puts to that row will have to wait for that lock.
>> >
>> > HBase is fully consistent. This being said all the locking takes place at
>> row level only, so when you scan you have to take that into account as
>> there's no range locking.
>> >
>> > I'm not sure I understand the resource releasing issue. HTable.close()
>> flushes the current write buffer (you can have write buffer if you use
>> autoFlush set to false).
>> >
>> > Cosmin
>> >
>> >
>> > On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:
>> >
>> >
>> > Ok,
>> >
>> > First, I'm writing this before I've had my first cup of coffee so I am
>> apologizing in advance if the question is a brain dead question....
>> >
>> > Going from a relational background, some of these questions may not make
>> sense in the HBase world.
>> >
>> >
>> > When does HBase acquire a lock on a row and how long does it persist?
>> Does the lock only hit the current row, or does it also lock the adjacent
>> rows too?
>> > Does HBase support the concept of 'dirty reads'?
>> >
>> > The issue is what happens when you have two jobs trying to hit the same
>> table at the same time and update/read the rows at the same time.
>> >
>> > A developer came across a problem and the fix was to use the
>> HTable.close() method to release any resources.
>> >
>> > I am wondering if you explicitly have to clean up or can a lazy developer
>> let the object just go out of scope and get GC'd.
>> >
>> > Thx
>> >
>> > -Mike
>> >
>> >
>> > _________________________________________________________________
>> > The New Busy is not the too busy. Combine all your e-mail accounts with
>> Hotmail.
>> >
>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>> >
>> >
>> > _________________________________________________________________
>> > Hotmail is redefining busy with tools for the New Busy. Get more from
>> your inbox.
>> >
>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
>> >
>> >
>>
>
>
>
> --
> Guilherme
>
> msn: guigermoglio@hotmail.com
> homepage: http://sites.google.com/site/germoglio/
>

Mime
View raw message