jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "BackupTool" by Nico
Date Wed, 31 May 2006 20:07:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by Nico:

New page:
After some iterations on the ML and some discussion with Jukka, here is what we propose to
build this summer.

Please feel free to comment (through the ML or this [http://www.contact-us.info/contact.php?id=18
contact form] to contact directly Nicolas Toper). 

== Architecture ==

We want to ease the installation of the backup tool and keep the most flexibility when performing
a backup. 

Besides we need to support Jackrabbit installation in its three models: as a shared J2EE resource,
as a repository server and as a webapp bundle. Since in this last case we cannot assume any
communication layer, we have only one option on where to put the backup tool: as a local backup
and maybe a local restore method in the standard org.apache.jackrabbit.api.JackrabbitRepository
interface. Jukka? What class should implement this interface? RepositoryImpl in core package?

The backup operation is quite flexible. It can be fired from everywhere (application, sysadmin,
remote client,…). 

The restore operation would be fired from a separate application which would be the only one
using the repository. There are a lot of constraint to restore but since it is a quite rare
operation, its impact is quite limited. 

JackRabbit is still a new project. The functionality set would probably evolve. The backup
tool need to show good evolution capability so it can follow Jackrabbit evolutions. Jukka
and I propose to use a XML configuration file loaded with each save operation. The configuration
file would define what resource is to be saved (and therefore restored) and how (by pointing
to a class). This way, it is easy to add new kind of resource (and share the code with the
community) and create your own backup plan. If the API change a little bit, we would have
only to update one class. (The configuration file is backuped with the repository so the restore
operation know what to restore).

For instance, the configuration file would look like this: (it is not a proposition yet, just
an example, I will propose later a real format).



<param name=”login” value=”***”>

<param name=”password” value=”***”>



<resource name=”custom node type” savingClass=”FQN backup class”/>


<workspaces type=”selected|all” >

<workspace name=”wsp1” />

<workspace name=”wsp2” />

<workspace name=”wsp3” />


</rabbitHole >

As you can see, we can backup either all workspaces or a specific one and the same class is
used to save and restore a resource (we would be implementing a specific interface with two
methods: backup and restore). The Javadoc is going to specify the dependency from the save/restore
class to the “main” one (using @link?).

External parameters would be passed to the classes too (for instance to know where is the

== Saved data ==
For now, here is the data we plan to backup. It is only a first step and other resources can
be added easily.

'''Backup Configuration file''' 

 * As described upper.

'''Jackrabbit Configuration File'''

 * repository.xml

 * workspace.xml

 * All workspaces

 * Node version histories (we will backup the workspace directly contrary to what I have written

 * Custom Node Types

 * Namespace


We will not save Lucene index for now.

== Save and restore algorithm ==
-	The configuration files is saved as files.

-	The workspaces (and node version histories) is transferred to a specific workspace (SavingWorkspace)
using ObjectPM or XmlPM. We would zip the directory, copy it and destroy the workspace. 

-	Other resources (custom node types and namespaces) are saved and serialized using Jackrabbit's
internal xml node type serialization format (NodeTypeWriter and NodeTypeReader for instance).

== Locking strategy ==

For the backup operation, locking will be managed on the backup class level. There is no need
for now to hold a global lock for now. We will put a JCR deep lock on the root node of the
workspace we are currently saving. If a lock is already held, we would raise an exception
(but not kill the backup; it would proceed without the specified workspace). 

About the Jackrabbit conf files, they are not modified by any processes (even workspace.xml?)
so there is no issue.

About the other resources (custom node types and namespaces), it is (is it?) managed by the
already present code.

For the restore operation, there is no issue since, the restore tool would be the only one
using the repository. It is application/syadmin responsibility to enforce this behaviour.

== Next Steps ==

 * exact format of the XML configuration file
 * code design phase: UML Class diagram. 
 * schedual 
 * coding

Each phase is serparated by an iteration on the ML to gather your feedback

=== Evolution ===
After the first release, here are some evolutions ideas

 * Add a remote client using either a dedicated RMI connection or the JCR one.  
 * Add support later for a restore operation while the repository is still in operation by
rewriting the local restore operation and its client.
 * Hotbackup (see post on the ML on this subject)
 * Incremental backup (using Rsync ?)
 * Backup Lucene Index (see post on Lucene ML about saving indexes)

'''Please do not hesitate to contact Nicolas Toper (through the ML or this [http://www.contact-us.info/contact.php?id=18
contact form]) on any question/suggestion/idea on this project'''

View raw message