subversion-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Canfield <andy.canfi...@pimco.mobi>
Subject Re: My Backup Script
Date Tue, 26 Jul 2011 22:14:29 GMT

On 07/27/2011 01:34 AM, Nico Kadel-Garcia wrote:
> On Tue, Jul 26, 2011 at 2:33 AM, Andy Canfield<andy.canfield@pimco.mobi>  wrote:
>> For your information, this is my backup script. It produces a zip file that
>> can be tranported to another computer. The zip file unpacks into a
>> repository collection, giving, for each repository, a hotcopy of the
>> repository and a dump of the repository. The hotcopy can be reloaded on a
>> computer with the same characteristics as the original server; the dumps can
>> be loaded onto a different computer. Comments are welcome.
> Andy, can we love you to pieces for giving us a new admin to educate
> in subtleties?
Sure! I'm good at being ignorant. FYI I have a BS in Computer Science 
about 1970 and an MS in Operations Research in 1972, worked in Silicon 
Valley until I moved to Thailand in 1990. So although I am not stupid, I 
can be very ignorant.

And also the IT environment here is quite different. For example, MySQL 
can sync databases if you've got a 100Mbps link. Ha ha. I invented a way 
to sync two MySQL databases hourly over an unreliable link that ran at 
about modem speeds. I can remember making a driver climb a flagpole to 
make a cell phone call because the signal didn't reach the ground. To 
this day we run portable computers out in the field and communicate via 
floppynet. In this region hardware costs more than people, and software 
often costs nothing.

>> #! /bin/bash
>>
>> # requires root access
>> if [ ! `whoami` == root ]
>> then
>>     sudo $0
>>     exit
>> fi
>>
>> # controlling parameters
>> SRCE=/data/svn
>> ls -ld $SRCE
>> DEST=/data/svnbackup
>> APACHE_USER=www-data
>> APACHE_GROUP=www-data
> Unless the repository is only readable owned by root, this should
> *NOT* run as root. Seriously. Never do things as the root user that
> you don't have to. If the repository owner is "svn" or "www-data" as
> you've described previously, execute this as the relevant repository
> owner.
There are reasonable justifications for running it as root:
[1] Other maintenance scripts must be run as root, and this puts all 
maintenance in a central pool. My maintenance scripts are crontab jobs 
of the form /root/bin/TaskName.job which runs /root/bin/TaskName.sh and 
pipes all stderr and stdout to /root/TaskName.out. Thus I can skim 
/root/*.out and have all the job status information at my fingertips.
[2] For some tasks, /root/bin/TaskName.job is also responsible for 
appending /root/TaskName.out to /root/TaskName.all so that I can see 
earlier outputs. There is a job that erases /root/*.all the first of 
every month.
[3] I have heard for a long time never run GUI as root.  None of these 
maintenance scripts are GUI.
[4] There are many failure modes that will only arise if it is run as 
non-root. For example, if run as root, the command "rm -rf 
/data/svnbackup" will absolutely, for sure, get rid of any existing 
/data/svnbackup that exists, whoever it is owned by, whatever junk is 
inside it.

>> # Construct a new empty SVNParent repository collection
>> rm -rf $DEST
>> mkdir $DEST
>> chown $APACHE_USER $DEST
>> chgrp $APACHE_GROUP $DEST
>> chmod 0700 $DEST
>> ls -ld $DEST
> And do..... what? You've not actually confirmed that this has succeded
> unless you do something if these bits fail.
Many of your comments seem to imply that this script has not been 
tested. Of course it's been tested already, and in any production 
environment it will be tested again. And if stdout and stderr are piped 
to /root/SVNBackup.out then I can check that output text reasonably 
often and see that it is still running. In this case I would check it 
daily for a week, weekly for a month or two, yearly forever, and every 
time somebody creates a new repository.

Also, by the standards of this part of the world, losing a day's work is 
not a catastrophe. Most people can remember what they did, and do it 
again, and it probably only takes a half-day to redo.

>> # Get all the names of all the repositories
>> # (Also gets names of any other entry in the SVNParent directory)
>> cd $SRCE
>> ls -d1 *>/tmp/SVNBackup.tmp
> And *HERE* is where you start becoming a dead man id mkdir $DEST
> failed. I believe that it works in your current environment, but if
> the parent of $DEST does not exist, you're now officially in deep
> danger executing these operations in whatever directory the script was
> run from.
As noted above, $DEST is /data/svnbackup. The parent of $DEST is /data. 
/data is a partition on the server. If that partition is gone, that's a 
failure that we're talking about recovering from.
>> # Process each repository
>> for REPO in `cat /tmp/SVNBackup.tmp`
> And again you're in trouble. If any of the repositories have
> whitespace in their names, or funky EOL characters, the individual
> words will be parsed as individual arguments.
This is Linux. Anyone who creates a repository with white space in the 
name gets shot.

>> do
>>     # some things are not repositories; ignore them
>>     if [ -d $SRCE/$REPO ]
Here is a likely bug in the script. I treat every subdirectory of the 
SVNParent repository collection as if it were a repository. But it might 
not be. There might be valid reasons for having a different type of 
subdirectory in there. Probably this line should read something like
     if [ -d $SRCE/$REPO/hooks ]
     then
         .. backup the repository..
     else
         ... just copy it over ...
     endif

>>     then
>>         # back up this repository
>>         echo "Backing up $REPO"
>>         # use hotcopy to get an exact copy
>>         # that can be reloaded onto the same system
>>         svnadmin  hotcopy  $SRCE/$REPO   $DEST/$REPO
>>         # use dump to get an inexact copy
>>         # that can be reloaded anywhere
>>         svnadmin  dump     $SRCE/$REPO>$DEST/$REPO.dump
>>     fi
>> done
> See above. You're not reporting failures, in case the repository is
> not of a compatible Subversion release as the current "svnadmin"
> command. (This has happened to me when someoone copied a repository to
> a server with older Subversion.)
Yes. But then the failure was on the setting up the repository, not on 
backing it up. Perhaps I should run
*    svnadmin verify $SRCE/$REPO*
first and take appropriate action if it fails. Oh, please don't tell me 
that 'svnadmin verify' doesn't really verify completely!

On another point, "reporting failures" ought to mean "sending e-mail to 
the sysadmin telling him that it failed. I've been trying to do that for 
years and cannot.  I can not send e-mail to an arbitrary target e-mail 
address user@example.com from a Linux shell script.
* Most require 'sendmail', notoriously the hardest program on the planet 
to configure.
* I found that installing 'sendmail', and not configuring it at all, 
prevented apache from starting at boot time. Probably something wrong 
with init.
* Much of the documentation on sendmail only covers sending e-mail to an 
account on that server computer, not to user@example.com elsewhere in 
the world. As if servers were timesharing systems.
* Sendmail has to talk to an SMTP server. In the past couple of years it 
seems as if all the SMTP servers in the world have been linked into an 
authorization environment to prevent spam. So you can't just run your 
own SMTP - it's not certificated.
* Thunderbird knows how to log in to an SMTP server; last time I looked 
sendmail did not.

Without e-mail, any notification system requires my contacting the 
machine, rather than the machine contacting me. And that is unreliable.

>> # Show the contents
>> echo "Contents of the backup:"
>> ls -ld $DEST/*
This is for /root/SVNBackup.out. It lists the repositories that have 
been included in the backup.

Indeed, the above line that reads
     echo "Backing up $REPO"
only exists because hotcopy outputs progress info. I tried "--quiet" and 
it didn't shut up. Maybe "-q" works.

>> # zip up the result
>> cd $DEST
>> zip -r -q -y $DEST.zip .
> Don't use zip for this. zip is not installed by default on a lot of
> UNIX and Linux systems, tar and gzip are, and give better compression.
> Just about every uncompression suite in the world supports .tgz files
> as gzipped tarfiles, so it's a lot more portable.
The 'zip' program is installable on every computer I've ever known. And, 
at least until recently, there were LOTS of operating systems that did 
not support .tar.gz or .tar.bz2 or the like. IMHO a zipped file is a lot 
more effectively portable. And the compression ratio is close enough 
that I'm willing to get 15% less compression for the portability.

> Also, the script has ignored the problems of symlinks. You may not use
> them, but a stack of people use symlinked files to pre-commit scripts,
> password files, or other tools among various repositories from an
> outside source. If you haven't at least searched for and reported
> symlinks, you've got little chance of properly replicating them for
> use elsewhere.
My guess is that there are two types of symlinks; those that point 
inside the repository and those that point outside the repository. Those 
that point inside the repository should be no problem. Those that point 
outside the repository are bad because there is no guarantee that the 
thing pointed to exists on any given machine that you use.

And AFAIK svnadmin and svndump preserve symlinks as such, and that is 
the best that I can do in either case.

Also, this is the kind of thing where you backup the symlink and later, 
if we must restore, some human being says "What does this symlink point 
to?"
>> # Talk to the user
>> echo "Backup is in file $DEST.zip:"
>> ls -ld $DEST.zip
> It looks like you're relying on "ls -ld"
Again, this is a more-or-less standard part for the purpose of putting 
information into the /root/SVNBackup.out file. All of my backup scripts 
do this. Sometimes I look and say to myself "Why did the backup suddenly 
triple in size?" and dig around and discover that some subdirectory was 
added that should not have be present.

>> # The file $DEST.zip can now be transported to another computer.
> And for a big repository, this is *grossly* inefficient. Transmitting
> bulky compressing files means that you have to send the whole thing in
> one bundle, or incorporate wrappers to split it into manageable
> chunks. This gets awkward as your Subversion repositories grow, and
> they *will* grow because Subversion really discourages discarding
> *anything* from the repositories.
A backup file that is created on an attached portable disk does not need 
to be transported.

A backup file that is transmitted over a LAN once a day is not too big, 
no matter how big; 3 hours is a reasonable time frame for transport.

Historically I ran a crontab job every morning at 10AM that copied a 
backup file to a particular workstation on the LAN. By 10AM that 
workstation is turned on, and if it slows down, well, the lady who uses 
it is not technically minded enough to figure out WHY it's slowing down. 
And it was only a few megabytes.

Yeah, a zip of the entire SVNParent repository collection might be a 
too  big to send over the Internet.

Oh yes, one more thing. Using svnadmin in various ways it is possible to 
purge old revisions from a repository. I would expect that we do that 
periodically, maybe once a year. If we're putting out version 5.0 of 
something, version 3.0 should not be in the repository, it should be in 
an archive.
>   I'd very strongly urge you to revew
> the use of "svnsync" to mirror the content of the repository to
> another server on another system, coupled with a wrapper to get any
> non-database components separately. This also reduces "churn" on your
> drives, and can be so much faster that you can safely run it every 10
> minutes for a separate read-only mirror site, a ViewVC or Fisheye
> viewable repository, or publication of externally accessible
> downloadable source.
I shy away from svnsync right now because it requires me to get TWO of 
these Subversion systems running. At present I am almost able to get one 
running. Almost.

> As harsh as I'm being, Andy, it's actually not bad for a first shot by
> someone who hasn't been steeped in the pain of writing industry grade
> code like some of us. For a one-off in a very simple environment, it's
> fine, something to get this weeks' backups done while you think about
> a more thorough tool, it's reasonable, except for the big booby trap
> David Chapman pointed out about using the hotcopies, not the active
> repositories, for zipping up.
Thank you. I think the key phrase here is "a very simple environment". 
How much do we pay for a server? 400 dollars. One guy recommended buying 
a server for 4,000 dollars and he was darned near fired for it.

I fixed the booby trap already. Your comments will lead to some other 
changes. But not, for now, a second computer.

OH! I thought of something else!

Suppose we do a backup every night at midnight, copying it to a safe 
place. And suppose that the server dies at 8PM Tuesday evening. Then all 
submits that occurred on Tuesday have been lost. Presumably we'd find 
out about this on Wednesday.

But a working copy is a valid working copy until you delete it. Assuming 
that the working copies still exist, all we need to do is
* Restore the working SVNParent repository collection on a replacement 
computer.
* Have everyone 'svn commit' from their working copies.
* Unscramble the merge problems, which should be few.

This becomes feasible if nobody deletes their working copy until 48 
hours after their last commit. And my guess is that people will do that 
naturally. People who are working on the package will keep one working 
copy indefinitely, updating it but not checking out a whole new one. 
People who do only brief work on the package will not purge the working 
copy until they start worrying about disk space.

Thank you very much.


Mime
View raw message