directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <>
Subject Re: [Mavibot] BtreeHeader management, transactions and other things...
Date Mon, 14 Sep 2015 02:40:25 GMT
On Mon, Sep 14, 2015 at 5:44 AM, Emmanuel Lécharny <>

> Hi,
> I have looked at the code today, and I found that the way we handle the
> BtreeHeader is a bit complex, and does not fit some other ideas I had
> regarding the management of transactions.
> Currently, we store a map of BH where for each BTree, we have the latest
> BH (ie, the one associated with the most recent revision). When we want
> to update a btree, or read it, we first check this map and use the
> returned BH to start updating or reading the BTREE.
> This is not good, IMO.
> Actually, we should always fetch the most recent revision for a given
> BTree from the BOB. That change the implementation of the
> getBtreeHeader() method.
> Why should we do it differently, and how does it connect with teh TXNs ?
> That simple (well, sort of). txns will hold in a working memory (WM) all
> the pages that will be updated from teh beginning to the end of the
> transaction, allowing us to avoid many updates on disk - currently, the
> way we process transaction is pretty brutal : we write teh modified
> pages on disk, until teh end of the txn, even if we might very well
> modify one of those pages -.
> So the 'new way' should update the pages we have in the WM. That is
> possible if we reference pages using their offset, but then that changes
> the way we process the pages (currently, we preemptively copy a page
> that we are going to modify). We will *not* anymore copy a page if it's
> present in the WM, we will just update it. At the end, teh WM will
> contain all the modified pages, and we will just have to write them on
> disc (or discard them) when we commit (or rollback) teh transaction.
> But the current code has only two way to fetch a page :
> - either it's in the cache, and we return it
> - or we read the page from disk
> (This is what the PersistedPageHolder.getValue() does)
> We need to add a third possibility : to get the page from the WM, when
> we are updating the BTree, and if it's not present in teh WM, then fetch
> it (from the cache or the disk) and put it into the WM.
> Then the update (insert or delete) must be done without creating a copy.
> That is a huge change in the code... But thsi is necessary if we want to
> have an efficient transaction handling. It also allow us to get rid of
> those synchronized Maps containing the BTreeHeaders.
> One more things (à la Apple) : we most certainly don't need to manage
> multiple values with sub-btrees in Mavibot : As soon as we have a fully
> working transaction system, we could perfectly expect the application to
> deal with such a specific case : all in all, in a Btree<K, V>, where V
> is the user's data structure, it's up to the user to make V a BTree, and
> to deal with it. As we will have a cross- b-tree transaction system, it
> won't be expensive, plus this is already what we do with JDBM, so the
> ApacheDS code will not be difficult to port.
we should move to explicit begin() and commit() to support the cross Btree
transactions, this will impact ApacheDS code a bit cause now we need
a txn handle to pass around

> A bit of work in our plates ;-)

> Thoughts ?

Kiran Ayyagari

View raw message