lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy D'Arcy Jewell <>
Subject Re: Backing up SolR 4.0
Date Tue, 04 Dec 2012 08:55:07 GMT
On 03/12/12 18:04, Shawn Heisey wrote:
> Serious production Solr installs require at least two copies of your 
> index.  Failures *will* happen, and sometimes they'll be the kind of 
> failures that will take down an entire machine.  You can plan for some 
> failures -- redundant power supply and RAID are important for this.  
> Some failures will cause downtime, though -- multiple disk failures, 
> motherboard, CPU, memory, software problems wiping out your index, 
> user error, etc.If you have at least one other copy of your index, 
> you'll be able to keep the system operational while you fix the down 
> machine.
> Replication is a very good way to accomplish getting two or more 
> copies of your index.  I would expect that most production Solr 
> installations use either plain replication or SolrCloud.  I do my 
> redundancy a different way that gives me a lot more flexibility, but 
> replication is a VERY solid way to go.
> If you are running on a UNIX/Linux platform (just about anything 
> *other* than Windows), and backups via replication are not enough for 
> you, you can use the hardlink capability in the OS to avoid taking 
> Solr down while you make backups.  Here's the basic sequence:
> 1) Pause indexing, wait for all commits and merges to complete.
> 2) Create a target directory on the same filesystem as your Solr index.
> 3) Make hardlinks of all files in your Solr index in the target 
> directory.
> 4) Resume indexing.
> 5) Copy the target directory to your backup location at your leisure.
> 6) Delete the hardlink copies from the target directory.
> Making hardlinks is a near-instantaneous operation.  The way that 
> Solr/Lucene works will guarantee that your hardlink copy will continue 
> to be a valid index snapshot no matter what happens to the live 
> index.  If you can make the backup and get the hardlinks deleted 
> before your index undergoes a merge, the hardlinks will use very 
> little extra disk space.
> If you leave the hardlink copies around, eventually your live index 
> will diverge to the point where the copy has different files and 
> therefore takes up disk space.  If you have a *LOT* of extra disk 
> space on the Solr server, you can keep multiple hardlink copies around 
> as snapshots.
> Recent versions of Windows do have features similar to UNIX links, so 
> there may in fact be a way to do this on Windows.  I will leave that 
> for someone else to pursue.
> Thanks,
> Shawn
Thanks Shawn, that's very informative. I get twitchy with anything where 
you "can't" back it up (memcached excepted). As an administrator, it's 
my job to recover from failures, and backups are kind of my comfort blanket.

I'm running on Linux (on Debian Squeeze) in a fully virtual 
environment.  Initially, I think I'll have to just schedule the backup 
for the early hours (local time) but as we grow, I can see I'll have to 
use replication to do it seamlessly. The system is necessarily small 
right now, as we haven't yet gone live, butwe are anticipating rapid 
growth, so replication has always been on the cards.

Is there an easy way to tell (say from a shell script) when "all commits 
and merges [are] complete"?

If I keep a replica solely for backup purposes, I assume I can "do what 
I like with it" - presumably replication will resume/catch-up when I 
resume it (I admit, I have a bit of reading to do wrt replication - I 
just skimmed that because it wasn't in my initial brief).

I'm assuming that because you're using hardlinks, that means that SolR 
writes a "new" file when it updates (sortof copy-on-write style)? So we 
are relying on the principle that as long as you have at least one 
remaining reference to the data, it's not deleted...

Thanks once again!


Andy D'Arcy Jewell

SysMicro Limited
Linux Support

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message