directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kiran Ayyagari <>
Subject Re: [Mavibot] BtreeHeader management, transactions and other things...
Date Mon, 14 Sep 2015 08:40:04 GMT
On Mon, Sep 14, 2015 at 4:31 PM, Emmanuel Lécharny <>

> Le 14/09/15 04:40, Kiran Ayyagari a écrit :
> > On Mon, Sep 14, 2015 at 5:44 AM, Emmanuel Lécharny <>
> > wrote:
> >
> >> Hi,
> >>
> >> I have looked at the code today, and I found that the way we handle the
> >> BtreeHeader is a bit complex, and does not fit some other ideas I had
> >> regarding the management of transactions.
> >>
> >> Currently, we store a map of BH where for each BTree, we have the latest
> >> BH (ie, the one associated with the most recent revision). When we want
> >> to update a btree, or read it, we first check this map and use the
> >> returned BH to start updating or reading the BTREE.
> >>
> >> This is not good, IMO.
> >>
> >> Actually, we should always fetch the most recent revision for a given
> >> BTree from the BOB. That change the implementation of the
> >> getBtreeHeader() method.
> >>
> >> Why should we do it differently, and how does it connect with teh TXNs ?
> >> That simple (well, sort of). txns will hold in a working memory (WM) all
> >> the pages that will be updated from teh beginning to the end of the
> >> transaction, allowing us to avoid many updates on disk - currently, the
> >> way we process transaction is pretty brutal : we write teh modified
> >> pages on disk, until teh end of the txn, even if we might very well
> >> modify one of those pages -.
> >>
> >> So the 'new way' should update the pages we have in the WM. That is
> >> possible if we reference pages using their offset, but then that changes
> >> the way we process the pages (currently, we preemptively copy a page
> >> that we are going to modify). We will *not* anymore copy a page if it's
> >> present in the WM, we will just update it. At the end, teh WM will
> >> contain all the modified pages, and we will just have to write them on
> >> disc (or discard them) when we commit (or rollback) teh transaction.
> >>
> >> But the current code has only two way to fetch a page :
> >> - either it's in the cache, and we return it
> >> - or we read the page from disk
> >> (This is what the PersistedPageHolder.getValue() does)
> >>
> >> We need to add a third possibility : to get the page from the WM, when
> >> we are updating the BTree, and if it's not present in teh WM, then fetch
> >> it (from the cache or the disk) and put it into the WM.
> >> Then the update (insert or delete) must be done without creating a copy.
> >>
> >> That is a huge change in the code... But thsi is necessary if we want to
> >> have an efficient transaction handling. It also allow us to get rid of
> >> those synchronized Maps containing the BTreeHeaders.
> >>
> >> One more things (à la Apple) : we most certainly don't need to manage
> >> multiple values with sub-btrees in Mavibot : As soon as we have a fully
> >> working transaction system, we could perfectly expect the application to
> >> deal with such a specific case : all in all, in a Btree<K, V>, where V
> >> is the user's data structure, it's up to the user to make V a BTree, and
> >> to deal with it. As we will have a cross- b-tree transaction system, it
> >> won't be expensive, plus this is already what we do with JDBM, so the
> >> ApacheDS code will not be difficult to port.
> >>
> > we should move to explicit begin() and commit() to support the cross
> Btree
> > transactions,
> Absolutely. The current transaction support is less than half baked, it
> only support per-BTree transaction, which is pretty much useless.
> Actually, we do need to pass a context to each Btree operation, context
> that holds the revision of each B-trees being part of the transaction.
> That also mean we have one common revision for each transaction,
> revision which ties all the Btrees' revisions that were part of the
> transaction.
> Typically, in a ApacheDS context, when we update an entry, we nt only
> update the master BTree, but the RDN btree and every btree associated
> with each index. And all those updates must be visible as a whole when
> we want to fetch an entry or an index for this revision.
> the solution would be to use the transaction ID (which will be
> incremental) as the revision number for all the b-trees being updated
> during that transaction. That saves us the pain of keeping a track of
> the various B-trees revisions when applying an operation to a given
> revision of a b-tree.
> +1 that is a good idea

Kiran Ayyagari

View raw message