directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel L├ęcharny <elecha...@gmail.com>
Subject ApacheDS & Mavibot thoughts for the end of this year :-)
Date Tue, 26 Dec 2017 08:41:39 GMT
Hi guys,

last september, I worked on Mavibot in order to add transactio into it.
It now works.

Transactions bring many advantages to the backend :
- it guarantees atmicity at an upper level (ie, cross B-tree). This is
critical for the server, as a simple update (ADD/Modify/Delete/ModRDN)
impact more than one B-tree
- we can gather many updates per transaction, which can be use either to
speed up updates if the user don't really care about losing a bit of
them - commit can be done every N updates or every M minutes, for
instanc - or to inject huge amount of data (like what we would do for a
bulkload). This second usage seems good to have, no matter what. FTR,
injecting 100 000 elements in a <int, String> B-tree takes 0,5 seconds
on my laptop. We can imagine that even if it takes 100 times more
processing to inject LDAP entries, that would mean 50 seconds to load
100 000 entries. You can compare that with the current 100 entries/s we
can get with JDBM... (20 times slower...)

Ok, anyway, we all know that Mavibot is really badly needed.

The thing I have in mind atm is that even if Mavibot is not completed
(free pages aren't managed, dead versions aren't removed), we can still
benefit from it.

The fact that we don't clean up dead revisions is not necessarily
critical : we can turn that on a feature, the hability we would have to
fetch data at a given revision/date. In a system where audit is
critical, that would be a plus.
One of the biggest issue with keeping all the revisions is that the
database will grow fast, but we can mitigate this :
- first with transaction, we can limit this growth in two ways : we
don't update the management B-tree more than once, instead of once per
B-tree update, so even if we don't deffer transaction commits, we still
limit the growth rate
- second we can deffer commits, as I explain upper.

It's not perfect, but if we consider a 100 000 entries database, with 20
indexes, that would mean around 5 * 20 * 1024 * N bytes added for N
updates per day (20 indexes updated, 5 level updated, 1024 bytes pages).
If N is 100, which is conservative for a LDAP server. This is adding
10Mb/day to the database, less than 4Gb/year. Ok, I know this is a back
of the envelop calculation, but that gives an idea.

In any case, I do think it makes sense to offer this option to our
users, who are suffering for years from JDBM data corruption.

Here is what I would propose :
- add the Mavibot V2 backend, as is, with all the pros and cons
- implement LDAP transaction as specified in RFC 5805. This will be used
for batched updates (kind of bulkload)
- keep JDBM as is
- add a system to shrink the database (either offline or on line, see at
teh end of this mal)

The biggest issue with keeping JDBM is that we will have to keep some of
the locks we have added, so getting rid of them for Mavibot might be a
bit of a pain.


In the longer term, I will implement free page management/ old version
removal in Mavibot, removing the growing database issue.

Regarding the Mavibot shrinking tool, I have some ideas about it.
Basically, we need to be able to get rid of dead revisions (ie, revision
we know are not anymore in use). Removing old revisions is easy, teh
problem is to be sure they aren't used anymore. Doing so on a offline
database is trivial : we can delete all of them without risking losing
anything.
On a online database, that means we have to keep a track on alive read
transaction. We can safely delete all the revisions that are older than
the oldest used read transaction. That being said, shrinking the
database while it's in use is just a matter of blocking every writes,
create a new database and inject the latest version into it (which may
take a bit of time). Once done, we switch the database for new
operations - but keep the old one for ongoing operations - and when no
operation is using the old database anymre, we can delete it. This is
slightly more complex, so I'm not sure it worth the effort, and I'd
rather spend my limited time on adding teh free Pages/Old version
management in Mavibot.


That my Xmas thought, please feel free to comment !

Thanks !


-- 
Emmanuel Lecharny

Symas.com
directory.apache.org


Mime
View raw message