hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Row level locking?
Date Sat, 17 Jul 2010 00:12:07 GMT
Fine grain locking is not a good use case for ZooKeeper given it's 
quorum based architecture.

Patrick

On 07/16/2010 01:24 PM, Ryan Rawson wrote:
> Explicit locks with zookeeper would be (a) slow and (b) completely out
> of band and ultimately up to you.  I wouldn't exactly be eager to do
> our row locking in zookeeper (since the minimum operation time is
> between 2-10ms).
>
> You could do application advisory locks, but that is true no matter
> what datastore you use...
>
> On Fri, Jul 16, 2010 at 1:13 PM, Guilherme Germoglio
> <germoglio@gmail.com>  wrote:
>> What about implementing explicit row locks using the zookeeper? I'm planning
>> to do this sometime in the near future. Does anyone have any comments
>> against this approach?
>>
>> (or maybe it was already implemented by someone :-)
>>
>> On Fri, Jul 16, 2010 at 5:02 PM, Ryan Rawson<ryanobjc@gmail.com>  wrote:
>>
>>> HTable.close does very little:
>>>
>>>   public void close() throws IOException{
>>>     flushCommits();
>>>   }
>>>
>>>
>>> None of which involves row locks.
>>>
>>> One thing to watch out for is to remember to close your scanners -
>>> they continue to use server-side resources until you close them or 60
>>> seconds passes and they get timed out.  Also be very wary of using any
>>> of the explicit row locking calls, they are generally trouble for more
>>> or less everyone.  There was a proposal to remove them, but I don't
>>> think that went through.
>>>
>>>
>>> On Fri, Jul 16, 2010 at 9:16 AM, Cosmin Lehene<clehene@adobe.com>  wrote:
>>>>
>>>> On Jul 16, 2010, at 6:41 PM, Michael Segel wrote:
>>>>
>>>>
>>>>
>>>> Thanks for the response.
>>>> (You don't need to include the cc ...)
>>>>
>>>> With respect to the row level locking ...
>>>> I was interested in when the lock is actually acquired, how long the lock
>>> persists and when is the lock released.
>>>>  From your response, the lock is only held on updating the row, and while
>>> the data is being written to the memory cache which is then written to disk.
>>> (Note: This row level locking different than transactional row level
>>> locking.)
>>>>
>>>> Now that I've had some caffeine I think I can clarify... :-)
>>>>
>>>> Some of my developers complained that they were having trouble with two
>>> different processes trying to update the same table.
>>>> Not sure why they were having the problem, so I wanted to have a good
>>> fix. The simple fix was to have them issue the close() the HTable connection
>>> which forces any resources that they acquired to be released.
>>>>
>>>>
>>>> It would help to know what the exact problem was. Normally I wouldn't see
>>> any problems.
>>>>
>>>>
>>>> In looking at the problem... its possible that they didn't have AutoFlush
>>> set to true so the write was still in the buffer and hadn't gotten flushed.
>>>>
>>>> If the lock only persists for the duration of the write to memory and is
>>> then released, then the issue could have been that the record written was in
>>> the buffer and not yet flushed to disk.
>>>>
>>>>
>>>> At the region server level HBase will use the cache for both reads and
>>> writes. This happens transparently for the user. Once something is written
>>> in the cache, all other clients will read from the same cache. No need to
>>> worry if the cache has been flushed.
>>>> Lars George has a good article about the hbase storage architecture
>>> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
>>>>
>>>> I'm also assuming that when you run a scan() against a region that any
>>> information written to buffer but not yet written to disk will be missed.
>>>>
>>>>
>>>> When you do puts into hbase you'll use HTable. The HTable instance is on
>>> the client.  HTable keeps a buffer as well and if autoFlush is false it only
>>> flushes when you do flushCommits() or when it reaches the buffer limit, or
>>> when you close the table. With autoFlush set to true it will flush for every
>>> put.
>>>> This buffer is on the client. So when data is actually flushed it gets on
>>> the region server where it will get in the region server cache and WAL.
>>>> Unless a client flushes the put no other client can see the data because
>>> it still resides on the client only. Depending on what you need to do you
>>> can use autoFlush true if you are doing many small writes that need to be
>>> seen immediately by others. You can use autoFlush false and issue
>>> flushCommits() yourself, or you can rely on the buffer limit for that.
>>>>
>>>> So I guess the question isn't so much the issue of a lock, but that we
>>> need to make sure that data written to the buffer should be flushed ASAP
>>> unless we know that we're going to be writing a lot of data in the m/r job.
>>>>
>>>>
>>>> Usually when you write from the reducer (heavy) is better to use a buffer
>>> and not autoFlush to have a good performance.
>>>>
>>>> Cosmin
>>>>
>>>>
>>>> Thx
>>>>
>>>> -Mike
>>>>
>>>>
>>>>
>>>> From: clehene@adobe.com<mailto:clehene@adobe.com>
>>>> To: user@hbase.apache.org<mailto:user@hbase.apache.org>
>>>> CC: hbase-user@hadoop.apache.org<mailto:hbase-user@hadoop.apache.org>
>>>> Date: Fri, 16 Jul 2010 12:34:36 +0100
>>>> Subject: Re: Row level locking?
>>>>
>>>> Currently a row is part of a region and there's a single region server
>>> serving that region at a particular moment.
>>>> So when that row is updated a lock is acquired for that row until the
>>> actual data is updated in memory (note that a put will be written to cache
>>> on the region server and also persisted in the write-ahead log - WAL).
>>> Subsequent puts to that row will have to wait for that lock.
>>>>
>>>> HBase is fully consistent. This being said all the locking takes place at
>>> row level only, so when you scan you have to take that into account as
>>> there's no range locking.
>>>>
>>>> I'm not sure I understand the resource releasing issue. HTable.close()
>>> flushes the current write buffer (you can have write buffer if you use
>>> autoFlush set to false).
>>>>
>>>> Cosmin
>>>>
>>>>
>>>> On Jul 16, 2010, at 1:33 PM, Michael Segel wrote:
>>>>
>>>>
>>>> Ok,
>>>>
>>>> First, I'm writing this before I've had my first cup of coffee so I am
>>> apologizing in advance if the question is a brain dead question....
>>>>
>>>> Going from a relational background, some of these questions may not make
>>> sense in the HBase world.
>>>>
>>>>
>>>> When does HBase acquire a lock on a row and how long does it persist?
>>> Does the lock only hit the current row, or does it also lock the adjacent
>>> rows too?
>>>> Does HBase support the concept of 'dirty reads'?
>>>>
>>>> The issue is what happens when you have two jobs trying to hit the same
>>> table at the same time and update/read the rows at the same time.
>>>>
>>>> A developer came across a problem and the fix was to use the
>>> HTable.close() method to release any resources.
>>>>
>>>> I am wondering if you explicitly have to clean up or can a lazy developer
>>> let the object just go out of scope and get GC'd.
>>>>
>>>> Thx
>>>>
>>>> -Mike
>>>>
>>>>
>>>> _________________________________________________________________
>>>> The New Busy is not the too busy. Combine all your e-mail accounts with
>>> Hotmail.
>>>>
>>> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>>>>
>>>>
>>>> _________________________________________________________________
>>>> Hotmail is redefining busy with tools for the New Busy. Get more from
>>> your inbox.
>>>>
>>> http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2
>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Guilherme
>>
>> msn: guigermoglio@hotmail.com
>> homepage: http://sites.google.com/site/germoglio/
>>

Mime
View raw message