directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: [Mavibot] Write Transaction support
Date Thu, 12 Dec 2013 07:31:27 GMT
Le 12/11/13 11:38 PM, Howard Chu a écrit :
> Emmanuel Lécharny wrote:
>> Hi guys,
>> while 1.0.0-M3 is being voted, I started to work on the next iteration.
>> Kiran is also working on the bulk load tool at the same time, so I will
>> work on the transaction support.
>> Updating the bTree is a fairly costly operation, as we have to copy many
>> pages, reclaim some free pages to store new data, and update the header
>> many times. We could save many of those writes if we do it at the end of
>> the update. One of the biggest save would be to write the header only
>> once (currently, it's being updated every time we ask for a free page).
>> But even if we implement such a mechanism, there are cases where we
>> would like to save more. Here are two possible use cases :
>> - we are injecting many entries in a BTree
>> - we are updating many BTrees but we want a consistant backend even if
>> there is a crash in the middle of all those updates, across oall the
>> updated BTrees (this is typically what we have to guarantee when
>> updating entries in LDAP).
>> So where does it bring us ? We need to avoid writing to disk as much as
>> we can up to the point we can do it safely. The idea is to copy the
>> pages in memory, without flushing them on disk, until we commit (or
>> rollback) the operation. If we have to copy the same page more than
>> once, then we will update the copy in memory. In any case, we operate on
>> a given revision for all the operation, regardless the number of
>> updates.
>> When the user is done, we have two use cases :
>> - a commit is called :
>> we flush on disk the pages we have stored in memory, and when it's done,
>> we update the header. We are all good.
>> - a rollback is called :
>> there is nothing else to do than to discard all the pages we copied in
>> memory
>> The only pb is to manage the free pages and the deleted pages. The free
>> pages will be reclaimed using the existing list of free pages, and
>> eventually from the end of the file if we don't have enough free pages.
>> This is quite easy. The deleted pages is more complicated to deal with,
>> as we may have many pages we will free. But again, those deleted pages
>> can be put into the free page list. Worst case is if we can't update the
>> copiedPage BTree : we may lose some pages (but that would not corrupt
>> the backend).
>> This is what I foresee, I need to go a bit deeper in the algorithm, but
>> I don't think I'm too far from something that flies.
> LMDB already does these steps. The only thing that's actually critical
> to proper operation is to defer the header update to the end of the
> commit, so keeping a copy of this page in memory is critical. Then
> commit or rollback is simply a matter of writing the header back to
> disk or not, no other page manipulations will be visible without the
> header update. The other thing is to make sure you're only using free
> pages that can't be seen by any active transactions.

Thanks Howard. I'm pleased to see that I'm not missing any parts, at
least at this level. Now, the implementation wil certainly not be easy
as writing this mail :-)

Emmanuel Lécharny 

View raw message