I have looked at the code today, and I found that the way we handle the
BtreeHeader is a bit complex, and does not fit some other ideas I had
regarding the management of transactions.
Currently, we store a map of BH where for each BTree, we have the latest
BH (ie, the one associated with the most recent revision). When we want
to update a btree, or read it, we first check this map and use the
returned BH to start updating or reading the BTREE.
This is not good, IMO.
Actually, we should always fetch the most recent revision for a given
BTree from the BOB. That change the implementation of the
Why should we do it differently, and how does it connect with teh TXNs ?
That simple (well, sort of). txns will hold in a working memory (WM) all
the pages that will be updated from teh beginning to the end of the
transaction, allowing us to avoid many updates on disk - currently, the
way we process transaction is pretty brutal : we write teh modified
pages on disk, until teh end of the txn, even if we might very well
modify one of those pages -.
So the 'new way' should update the pages we have in the WM. That is
possible if we reference pages using their offset, but then that changes
the way we process the pages (currently, we preemptively copy a page
that we are going to modify). We will *not* anymore copy a page if it's
present in the WM, we will just update it. At the end, teh WM will
contain all the modified pages, and we will just have to write them on
disc (or discard them) when we commit (or rollback) teh transaction.
But the current code has only two way to fetch a page :
- either it's in the cache, and we return it
- or we read the page from disk
(This is what the PersistedPageHolder.getValue() does)
We need to add a third possibility : to get the page from the WM, when
we are updating the BTree, and if it's not present in teh WM, then fetch
it (from the cache or the disk) and put it into the WM.
Then the update (insert or delete) must be done without creating a copy.
That is a huge change in the code... But thsi is necessary if we want to
have an efficient transaction handling. It also allow us to get rid of
those synchronized Maps containing the BTreeHeaders.
One more things (à la Apple) : we most certainly don't need to manage
multiple values with sub-btrees in Mavibot : As soon as we have a fully
working transaction system, we could perfectly expect the application to
deal with such a specific case : all in all, in a Btree<K, V>, where V
is the user's data structure, it's up to the user to make V a BTree, and
to deal with it. As we will have a cross- b-tree transaction system, it
won't be expensive, plus this is already what we do with JDBM, so the
ApacheDS code will not be difficult to port.
A bit of work in our plates ;-)