directory-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lecharny <>
Subject Backend recovery
Date Thu, 20 Nov 2008 14:27:40 GMT
Hi guys,

as I'm currently working on a recovery tool, I have reviewed the current 
configuration we have. Let's first analyse how a recovery tool is good 
for and how it should work.

1) Usage
We need to rebuild the indices if the database is corrupted. This can 
occur in many occasions, mainly when the server is brutally interupted 
(power shutdown, for instance). As indices are dynamically built while 
injecting new entries (or when updating those entries), it is important 
to have such a tool to rebuild them assuming the indices are not up to 
date, as otherwise we may have orphans entries, or even worst, indices 
pointing on non existing entries or wrong entries.

Another usage for such aa tool would be to create indices off line. This 
has the great advantage to allow a mass injection of entries into the 
master table (the table containing all the entries), then do a global 
re-indexing, potentially avoiding a lot of expensive controls, as we may 
run a pre-check on data before doing the injection. (of course, if we 
leverage the number of checks to do, we can go faster or slower, 
depending on how much we trust the data we inject into the server)

2) How it should work
This is what we have a real problem. Assuming that the backend storage 
can be totally FU, we can't really trust it to rebuild the indices. What 
are our options ? Let's see how the system works when we modify some 
entries :
- the database is in a S0 state at t0 (let's say when we have started 
the server the very first time)
- each added entry change the current state. As we may have a differed 
write, this state is written on the disk only every N seconds (unless 
the differed write is not activated)
- at some point, we are in state Si, and we have some more modifications
- then we have a crash. It can occurs before the current modifications 
are written on disk (and we have lost every modification since state Si 
(a), or, worse, it occurs while we are writing those modifications, and 
now, we can have a totally unstable base (b).

Case (a) can be handled, as the base is not corrupted. The problem is 
that we have lost some data, which may be a problem. However, if we 
don't use differed writes, we can avoid such a case (except for the last 
modification), with the major inconvenient that we are more likely to 
fell in case (b) now...

Case (b) is more problematic, because we have no way to determinate 
which was the previous stable state (Si), and to restore this state, as 
the base has already been partially modified.

What is the solution ? We have to assume that the state Si can be 
restored, and that we can apply every modification on it. The only way 
to do that is to combine two techniques :
- backup the base on regular basis, assuming that it can be done without 
allowing any update on this base during the backup (and this is not 
obvious, as the base can be pretty big)
- and store every modification into a journal, to be able to replay them 
on the restored base.

(keep in mind that we are not using a transactional system).

3) How does it translate for ADS ?

ADS has a ChangeLog interceptor which logs all the modified entries as a 
list of modifications. Each change has a unique revision (to be fixed, 
as we are not using a synchronized counter atm), and can be stored into 
a sequential text file, each modification being stored as a LDIF change 
operation( we currently don't have a file based storage).

The idea would be to add some way to save the underlying files, and then 
to apply all the stored logs on these files. We can even replay the 
whole journal from day one, but this will be totally overkilling. A 
third option, IMHO way better, as it eliminate the need to do a backup, 
would be to apply the log on a separate base, so that we don't need to 
do backups on the fly (of course, before applying the current logs, we 
should backup the spare files, and when the logs has been applied, ditch 
the N-1 backup). Here is the algorithm :

Journal last position is N, we have n modifications since then
Spare current base is version N

- time to apply the journal ! Mark the current position in the journal 
as N+1
- copy the spare base (Spare-N) to Spare-(N+1)
- apply the journal from position N to position N+1
- if everything is ok, tell the system that the current backup base is 
Spare-(N+1) and that the current log position is N+1
- now, we can ditch the Spare-N base

Journal last position is N+1 now, and we may have m modifications since 
then, as the server continue to log
Spare current base is version N+1

In order to get this working in the server, we have thinsg to do :
- decouple the Log sync from the partition sync (currently both are 
written at the same time)
- extend the ChangeLog to write in a flat file, injecting LDIF into it
- add a thread to implement the previous algorithm
- add a handler to run the previous algorithm if the server has crashed, 
when the server is restarted
- add a CL option to run the algo offline

A second advantage would be to allow a bulk load without all the time 
consumming controls we have in the server.

Thoughts ?

cordialement, regards,
Emmanuel L├ęcharny

View raw message