jackrabbit-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Jackrabbit Wiki] Update of "BackupTool" by Nico
Date Wed, 16 Aug 2006 19:18:28 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Jackrabbit Wiki" for change notification.

The following page has been changed by Nico:

- After some iterations on the ML and some discussion with Jukka, here is what we propose
to build this summer.
+ The backup tool v1 is able to backup a repository and restore it to a blank repository.
It uses as much as possible existing storage/restore mechanism.
+ Currently we manage the following resources:
+  * Repository (repository.xml)
+  * Node Type
+  * Namespaces
+  * All workspaces (config and content)
+  * Node version histories
+  * Backup configuration
- Please feel free to comment (through the ML or this [http://www.contact-us.info/contact.php?id=18
contact form] to contact directly Nicolas Toper). 
+ Please feel free to comment (through the ML or this [http://www.contact-us.info/contact.php?id=18
contact form] to contact directly Nicolas Toper). We plan to work on a second version soon.
We are currently gathering feedback and new use cases.
+ == Design Goals ==
+  * Backup both data and configuration option.
+  * Ease of use for sysAdmin to backup and restore content.
+  * Aim for generic operations: when adding new functionalities to JackRabbit we should not
have to update the backup application code.
+  * Aim for modularity. This would be the first release of the backup tool. It will evolve
for sure.
+  * Disk space is not an issue for now. (It can be worked out in another release)
+  * Performance is not an issue for now.(It can be worked out in another release)
+ == Prerequisites & Misc. ==
+  * All operations are sequential. No multi-threading are currently involved. 
+  * Repository must be stopped for backup/restore and dedicated to the backup/restore operations.
+  * The Backup tool source code is available via Subversion at
+    https://svn.apache.org/repos/asf/jackrabbit/trunk/contrib/backup/
+  and anonymous access is available at
+    http://svn.apache.org/repos/asf/jackrabbit/trunk/contrib/backup/
+  or with ViewVC at
+    http://svn.apache.org/viewvc/jackrabbit/trunk/contrib/backup/
+ == Backuping A Repository ==
+ To launch a backup, please run the following command:
+  LaunchBackup --zip myzip.zip --conf backup.xml --login nico --password mlypass backup repository.xml
+ where zip is the name of the file to generate, backup.xml, the name of the XML configuration

+ file (if you don't know how to use it, please use the one included), login and password:
the required userID and password.
+ == Restoring ==
+ To restore a repository, please prepare a blank repository (available through create repository)
+ LaunchBackup --zip ./myzip.zip -- conf backup.xml --login nico --password p restore repository.xml
+ where zip is the name of the backup file, backup.xml, the name of the XML configuration

+ file (if you don't know how to use it, please use the one included), login and password:
the required userID and password.
+ repository.xml and repository/ respectively points toward the repository.xml file and its
home to restore.
+  '''NB''' You can easily migrate one repository to the other this way and change PersistenceManager
  == Architecture ==
- We want to ease the installation of the backup tool and keep the most flexibility when performing
a backup. 
+ We tried as much as possible to achieve a symmetry between backup and restore. All classes
can be used for backup and restore operations.
- Besides we need to support Jackrabbit installation in its three models: as a shared J2EE
resource, as a repository server and as a webapp bundle. Since in this last case we cannot
assume any communication layer, we have only one option on where to put the backup tool: as
a local backup and a local restore method in the standard org.apache.jackrabbit.api.JackrabbitRepositoryImpl
(o.a.j.core.RepositoryImpl ).
+ The backup tool is organized in main classes:
- The backup operation is quite flexible. It can be fired from everywhere (application, sysadmin,
remote client, ...). 
+  * '''Launch utility''' (LaunchBackup) The launch utility allow to launch a backup through
a cronjob or the CLI. You can also integrate the backup in your application by instantiating
this class.
- The restore operation would be fired from a separate application which would be the only
one using the repository. There are a lot of constraint to restore but since it is a quite
rare operation, its impact is quite limited. 
+  * '''<Resource>Backup''' It is a collection of classes extending the abstract class
Backup. Each class is responsible for backuping/restoring a specific resource. For instance,
NodeTypeBackup is responsible to backup and restore all node types. To create another class
(for instance to backup Lucene indexes), extend Backup and implement its two abstract methods:
backup and restore. If you do so, please commit them back.
- Jackrabbit is still a new project. The functionality set would probably evolve. The backup
tool need to show good evolution capability so it can follow Jackrabbit evolutions. Jukka
and I propose to use a XML configuration file loaded with each save operation. The configuration
file would define what resource is to be saved (and therefore restored) and how (by pointing
to a class). This way, it is easy to add new kind of resource (and share the code with the
community) and create your own backup plan. If the API change a little bit, we would have
only to update one class. (The configuration file is backuped with the repository so the restore
operation know what to restore).
- The configuration file would look like this:.
+  * '''Manager''' manages the instanciation and handling of all <Resource>Backup classes
(please see NB). It knows which Backup subclasses to call through a XML configuration file.
You can therefore create easily your own customized backup.
+  * '''IOsystem''' The IOsystem is handled through an interface (BackupIOHandler) and its
implementation (ZipBackupIOHandler). It allows us to easily improve the IOsystem without impacting
other parts of the code.
+  '''NB''' The restore operation of the backup configuration and the repositories are special
since they are mandatory and allow the restore operations to take place. Therefore, LaunchBackup
is calling those two classes directly in order to be able to continue the restore operation.
+ == Configuration File ==
- <Backup>
- <WorkingFolder path="/home/" />
- <!-- For now only ObjectPersistenceManager and XMLPersistenceManager -->
- <PersistenceManager class="...">
-     <param name="...">...</param>
-     ...
- </PersistenceManager>
+ <Backup>
+ <WorkingFolder path="tmp/" />
-   <Ressources>
+   <Resources>
-     <Resource savingClass=”FQN backup class">
-        <!-- the resource class is fetching those parameters if needed. For now, only
one needs it -->
-        <param name=”param1” value=”1” />
+   <!-- The repository and the config file are automatically backupped -->
+     <Resource savingClass="org.apache.jackrabbit.backup.NodeTypeBackup" />
+     <Resource savingClass="org.apache.jackrabbit.backup.NamespaceBackup" />
+     <Resource savingClass="org.apache.jackrabbit.backup.NodeVersionHistoriesBackup" />
+     <Resource savingClass="org.apache.jackrabbit.backup.AllWorkspacesBackup" />
-     </Resource>
+    </Resources>
-     <Resource savingClass=”FQN backup class”/>
-     <Resource savingClass=”FQN backup class”/>
-   <Resource savingClass="BackupResourceAllWorkspaces" />
-   </Workspaces>
-   </Ressources>
+ == Backup And Restore Operations ==
- As you can see, we can backup either all workspaces or a specific one and the same class
is used to save and restore a resource (we would be implementing a specific interface with
two methods: save and restore). The Javadoc is going to specify the dependency in the backup
class (using @link?).
- External parameters would be passed to the classes too (for instance to know where is the
- == Saved data ==
- For now, here is the data we plan to backup. It is only a first step and other resources
can be added easily.
- '''Backup Configuration file''' 
-  * As described upper.
- '''Jackrabbit Configuration File'''
-  * repository.xml
-  * workspace.xml
- '''Data'''
-  * All workspaces
-  * Node version histories (we will backup the workspace directly contrary to what I have
written before).
-  * Custom Node Types
-  * Namespace
- '''NB'''
- We will not save Lucene index for now.
- == Save and restore algorithm ==
  -	The configuration files is saved as files.
- -	The workspaces (and node version histories) is transferred to a specific workspace (SavingWorkspace)
using ObjectPM or XmlPM. We would zip the directory, copy it and destroy the workspace. 
+ -	The workspaces (and node version histories) is exported using t to a specific workspace
(SavingWorkspace) using ObjectPM or XmlPM. We would zip the directory, copy it and destroy
the workspace. 
  -	Other resources (custom node types and namespaces) are saved and serialized using Jackrabbit's
internal xml node type serialization format (NodeTypeWriter and NodeTypeReader for instance).
  - We would then zip everything in the working folder move it as a stream to RepositoryImpl.

- == Locking strategy ==
- For the backup operation, locking will be managed on the backup class level. There is no
need for now to hold a global lock for now. We will put a JCR deep lock on the root node of
the workspace we are currently saving. If a lock is already held, we would raise an exception
(but not kill the backup; it would proceed without the specified workspace).
- About the Jackrabbit conf files, they are not modified by any processes (even workspace.xml?)
so there is no issue.
- About the other resources (custom node types and namespaces), it is (is it?) managed by
the already present code.
- For the restore operation, there is no issue since, the restore tool would be the only one
using the repository. It is application/syadmin responsibility to enforce this behaviour.
- == Classes ==
- Here is a first class diagram. It is far from being over but the structure is here.
- http://www.deviant-abstraction.net/wp-content/uploads/2006/06/classdiagram.gif
- As you can see everything is based on two classes Backup and BackupConfig. BackupConfig
fetches the XML file and instantiate the Backup class. As you see BackupRepository is just
a specialized Backup class. BackupRepository holds reference to all Backup configured classes
(ie BackupWorskpace).
- Special classes: BackupAllWorkspaces and BackupNodeVersionHistories link directly to BackupWorkspace
for simplicity purposes.
- BackupRepositoryConfig backup the repository.xml file and the properties file. If there
is one for the workspace we will use it.
- I can see for now two design patterns: a Builder slightly modified and a Facade.
- == Next Steps ==
-  * coding
-  * test and debugging
- Each phase is separated by an iteration on the ML to gather your feedback
  === Evolution ===
- After the first release, here are some evolutions ideas
+ Here are some evolutions ideas, please feel free to comment there or on the ML. We plan
to implement them soon.
   * Remove the need for the working folder. Use only streams.
   * Add asynchronous I/O (synchronous for now only since there are only a few resources to
@@ -146, +136 @@

        If collision, REPLACE_EXISTING = 2;
        Or throw an exception= 3; -->
      	<param name="uuidBehavior" value="0"/>)
- * Partial restore
+  * Partial restore
  '''Please contact Nicolas Toper (through the ML or this [http://www.contact-us.info/contact.php?id=18
contact form]) on any question/suggestion/idea on this project'''

View raw message