directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Seelmann <m...@stefan-seelmann.de>
Subject Re: ApacheDS & Mavibot thoughts for the end of this year :-)
Date Wed, 27 Dec 2017 22:53:01 GMT
On 12/26/2017 09:41 AM, Emmanuel L├ęcharny wrote:
> Hi guys,
> 
> last september, I worked on Mavibot in order to add transactio into it.
> It now works.
> 
> Transactions bring many advantages to the backend :
> - it guarantees atmicity at an upper level (ie, cross B-tree). This is
> critical for the server, as a simple update (ADD/Modify/Delete/ModRDN)
> impact more than one B-tree
> - we can gather many updates per transaction, which can be use either to
> speed up updates if the user don't really care about losing a bit of
> them - commit can be done every N updates or every M minutes, for
> instanc - or to inject huge amount of data (like what we would do for a
> bulkload). This second usage seems good to have, no matter what. FTR,
> injecting 100 000 elements in a <int, String> B-tree takes 0,5 seconds
> on my laptop. We can imagine that even if it takes 100 times more
> processing to inject LDAP entries, that would mean 50 seconds to load
> 100 000 entries. You can compare that with the current 100 entries/s we
> can get with JDBM... (20 times slower...)
> 
> Ok, anyway, we all know that Mavibot is really badly needed.
> 
> The thing I have in mind atm is that even if Mavibot is not completed
> (free pages aren't managed, dead versions aren't removed), we can still
> benefit from it.
> 
> The fact that we don't clean up dead revisions is not necessarily
> critical : we can turn that on a feature, the hability we would have to
> fetch data at a given revision/date. In a system where audit is
> critical, that would be a plus.
> One of the biggest issue with keeping all the revisions is that the
> database will grow fast, but we can mitigate this :
> - first with transaction, we can limit this growth in two ways : we
> don't update the management B-tree more than once, instead of once per
> B-tree update, so even if we don't deffer transaction commits, we still
> limit the growth rate
> - second we can deffer commits, as I explain upper.
> 
> It's not perfect, but if we consider a 100 000 entries database, with 20
> indexes, that would mean around 5 * 20 * 1024 * N bytes added for N
> updates per day (20 indexes updated, 5 level updated, 1024 bytes pages).
> If N is 100, which is conservative for a LDAP server. This is adding
> 10Mb/day to the database, less than 4Gb/year. Ok, I know this is a back
> of the envelop calculation, but that gives an idea.
> 
> In any case, I do think it makes sense to offer this option to our
> users, who are suffering for years from JDBM data corruption.
> 
> Here is what I would propose :
> - add the Mavibot V2 backend, as is, with all the pros and cons
> - implement LDAP transaction as specified in RFC 5805. This will be used
> for batched updates (kind of bulkload)
> - keep JDBM as is
> - add a system to shrink the database (either offline or on line, see at
> teh end of this mal)

Is implementation of LDAP transactions required? I guess that is just an
addtional feature on top of Mavibot. I mean I guess one can just use
Mavibot and benefit from non-corrupt database without having
transactions, right?

> The biggest issue with keeping JDBM is that we will have to keep some of
> the locks we have added, so getting rid of them for Mavibot might be a
> bit of a pain.

With "keep JDBM" you mean the user can decide via configuration if one
want to use JDBM or Mavibot? What do you suggest as default?

> In the longer term, I will implement free page management/ old version
> removal in Mavibot, removing the growing database issue.
> 
> Regarding the Mavibot shrinking tool, I have some ideas about it.
> Basically, we need to be able to get rid of dead revisions (ie, revision
> we know are not anymore in use). Removing old revisions is easy, teh
> problem is to be sure they aren't used anymore. Doing so on a offline
> database is trivial : we can delete all of them without risking losing
> anything.
> On a online database, that means we have to keep a track on alive read
> transaction. We can safely delete all the revisions that are older than
> the oldest used read transaction. That being said, shrinking the
> database while it's in use is just a matter of blocking every writes,
> create a new database and inject the latest version into it (which may
> take a bit of time). Once done, we switch the database for new
> operations - but keep the old one for ongoing operations - and when no
> operation is using the old database anymre, we can delete it. This is
> slightly more complex, so I'm not sure it worth the effort, and I'd
> rather spend my limited time on adding teh free Pages/Old version
> management in Mavibot.

I agree, better spend time to implement the "right" thing than spending
time on workarounds.


Mime
View raw message