hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Clint Morgan <clint.mor...@troove.net>
Subject Re: Secondary indexes and transactions
Date Tue, 19 Jan 2010 00:29:14 GMT
After the 2PC process has determined that a commit should happen there is no
roll-back. The commit must be processed.

So in your example, a commit has been approved, and one the of the regions
is told to go ahead and commit. The region triggers the index Put, but then
fails on his Puts (like out of disk space, out of memory, etc). This should
shutdown the RegionServer. Then when the region's WAL is recovered from, the
trx puts from the partially-committed transaction will be there. We will
look in the global transaction log to see that the trx is to be committed,
and then apply the puts to the base table.

-clint

On Fri, Jan 15, 2010 at 2:43 AM, Mridul Muralidharan
<mridulm@yahoo-inc.com>wrote:

> I think I might not have explained it well enough.
> As part of executing a Put, the index update happens prior to updating the
> underlying transactional table currently - and is done outside of the
> lock's.
> If the underlying transactional table update results in an exception - what
> is the state of the index ? From what I understand, a rollback is initiated
> - and this results in rolling back all regions - except for the one which
> threw the exception : and so the secondary index update which happened
> implicitly is never reverted.
> Or am I missing something here ?
>
> To be clear, I am talking about the actual commit as part of the two phase
> commit throwing an exception : not a conflict exception, but an IOException
> or variant - which can result in the secondary index going out of sync.
> I am contrasting it with the case of explicit indexes maintained by client
> - where the rollback by client (when the commit fails for a region) results
> in rollback on all the regions in the transaction - which includes the
> seconday indexes 'visible' to the client.
>
>
>
>
>
> Thanks,
> Mridul
>
>
>
>
>
>> If the regionserver crashes during this commit process, then I *think* it
>> should still recover correctly. It will see the transactional operations
>> in
>> the WAL, and the propagate the puts into the index. However this WAL
>> recovery stuff has been changing, and I'm not confident that it currently
>> works in all failure cases.
>>
>> Does this normal case address your concerns?
>>
>> -clint
>>
>> On Sun, Jan 3, 2010 at 4:46 PM, Mridul Muralidharan
>> <mridulm@yahoo-inc.com>wrote:
>>
>>  stack wrote:
>>>
>>>  On Sun, Jan 3, 2010 at 10:46 AM, Mridul Muralidharan
>>>> <mridulm@yahoo-inc.com>wrote:
>>>>
>>>>  I was wondering about the atomicity guarantees when using secondary
>>>>
>>>>> indexes from within a transaction.
>>>>>
>>>>>
>>>>>  You are talking about indexed hbase from transactional hbase contrib?
>>>>>
>>>>
>>> Yes, exactly.
>>>
>>>
>>>
>>>   From what I could gather, updates to the index table goes through its
>>>> own
>>>>
>>>>> (set of) rpc before the underlying transactional table is updated - and
>>>>> these update happens outside of the locks for the transaction table.
>>>>>
>>>>>
>>>>>  Yes.  But IIUC, the client is running a transaction that spans the
>>>> update
>>>> to
>>>> the two tables.  It'll take care of the undo should say the update to
>>>> the
>>>> transacation table fails.
>>>>
>>>>
>>>>  Isn't the update to the secondary index implicitly done ? As in, does
>>> the
>>> client 'see' this update ?
>>> My impression was that the secondary index update was done by the
>>> indexedregion - and was not visible to the client : which manages occ
>>> transaction ...
>>>
>>>
>>>
>>>
>>>   Also, the index regions need not colocate with the table region.
>>>>
>>>>> So essentially wondering
>>>>> a) if the index can go out of sync with the transactional table ?
>>>>>
>>>>>
>>>>>  It should not.  The client should run the undos if the insert does not
>>>> go
>>>> into both tables successfully.
>>>>
>>>>
>>>>
>>>>  b) if there are errors with update to table, are the indexes rolled
>>>> back
>>>>
>>>>> ?
>>>>>
>>>>>
>>>>>  Yes.
>>>>
>>>>
>>>>
>>>>  c) Whether there can be issues if there are parallel updates invoked
>>>> for
>>>>
>>>>> the same row - whether index changes end up being inconsistent with
>>>>> table
>>>>> data (due to lock not being held while updating index).
>>>>>
>>>>>
>>>>>  This might be possible.  There is a lock held on a row.  I'm not sure
>>>> if
>>>> the
>>>> lock is held on transaction table row while the update is being done to
>>>> the
>>>> index table.
>>>>
>>>> This is the doc. as it stands on transactional hbase:
>>>>
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/transactional/package-summary.html#package_description
>>>>
>>>> Here is the doc. on indexed-transactional hbase:
>>>>
>>>>
>>>> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/client/tableindexed/package-summary.html#package_description
>>>>
>>>> You've probably tripped over it already but just in case, it might help.
>>>>
>>>>
>>> I did go through the package sumamries, thanks : which is what increased
>>> my
>>> confusion.
>>>
>>> My current understanding is :
>>>
>>> a) Client 'simulates' the transaction - by inspecting the state of the
>>> rows
>>> on commit and rolls back in case of conflicting updates.
>>>
>>> b) secondary index updates are transparent to client api and are directly
>>> done by the indexedregion as part of its implementation.
>>>
>>>
>>> If this is correct, I am wondering if overlapping rollbacks can result in
>>> secondary index going out of sync with the table since (a) does not see
>>> those (one update gets rolled back while another goes through - or
>>> variations of it).
>>>
>>>
>>>
>>> Thanks,
>>> Mridul
>>>
>>>
>>>
>>>  St.Ack
>>>
>>>>
>>>>
>>>>  I guess they are all kind of related queries.
>>>>>
>>>>>
>>>>> I was not able to get a clear picture from the archives, so
>>>>> RTFM/pointers
>>>>> would be helpful if this is already answered.
>>>>>
>>>>> Thanks,
>>>>> Mridul
>>>>>
>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message