lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <s...@elyograg.org>
Subject Re: Backing up SolR 4.0
Date Mon, 03 Dec 2012 18:04:53 GMT
On 12/3/2012 9:47 AM, Andy D'Arcy Jewell wrote:
> However, wouldn't re-creating the index on a large dataset take an 
> inordinate amount of time? The system I will be backing up is likely 
> to undergo rapid development and thus schema changes, so I need some 
> kind of insurance against corruption if we need to roll-back after a 
> change.
>
> How should I go about creating multiplebackup versions I can put aside 
> (e.g. on tape) to hedge against the down-time which would be required 
> to regenerate the indexes from scratch?

Serious production Solr installs require at least two copies of your 
index.  Failures *will* happen, and sometimes they'll be the kind of 
failures that will take down an entire machine.  You can plan for some 
failures -- redundant power supply and RAID are important for this.  
Some failures will cause downtime, though -- multiple disk failures, 
motherboard, CPU, memory, software problems wiping out your index, user 
error, etc.If you have at least one other copy of your index, you'll be 
able to keep the system operational while you fix the down machine.

Replication is a very good way to accomplish getting two or more copies 
of your index.  I would expect that most production Solr installations 
use either plain replication or SolrCloud.  I do my redundancy a 
different way that gives me a lot more flexibility, but replication is a 
VERY solid way to go.

If you are running on a UNIX/Linux platform (just about anything *other* 
than Windows), and backups via replication are not enough for you, you 
can use the hardlink capability in the OS to avoid taking Solr down 
while you make backups.  Here's the basic sequence:

1) Pause indexing, wait for all commits and merges to complete.
2) Create a target directory on the same filesystem as your Solr index.
3) Make hardlinks of all files in your Solr index in the target directory.
4) Resume indexing.
5) Copy the target directory to your backup location at your leisure.
6) Delete the hardlink copies from the target directory.

Making hardlinks is a near-instantaneous operation.  The way that 
Solr/Lucene works will guarantee that your hardlink copy will continue 
to be a valid index snapshot no matter what happens to the live index.  
If you can make the backup and get the hardlinks deleted before your 
index undergoes a merge, the hardlinks will use very little extra disk 
space.

If you leave the hardlink copies around, eventually your live index will 
diverge to the point where the copy has different files and therefore 
takes up disk space.  If you have a *LOT* of extra disk space on the 
Solr server, you can keep multiple hardlink copies around as snapshots.

Recent versions of Windows do have features similar to UNIX links, so 
there may in fact be a way to do this on Windows.  I will leave that for 
someone else to pursue.

Thanks,
Shawn


Mime
View raw message