lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeff Wartes <jwar...@whitepages.com>
Subject Re: Data Import Handler / Backup indexes
Date Mon, 23 Nov 2015 18:56:07 GMT

The backup/restore approach in SOLR-5750 and in solrcloud_manager is
really just that - copying the index files.
On backup, it saves your index directories, and on restore, it puts them
in the data dir, moves a pointer for the current index dir, and opens a
new searcher. Both are mostly just wrappers on the proper Solr
replication-handler commands, since Solr already has some lower level APIs
for these operations.

There is a shared filesystem requirement for backup/restore though, which
is to account for the fact that when you make the backup you don’t know
which nodes will need to restore a given shard.

The commands would look something like:

    java -jar solrcloud_manager-assembly-1.4.0.jar backupindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>
    java -jar solrcloud_manager-assembly-1.4.0.jar restoreindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>

Or you could restore into a new collection:
    java -jar solrcloud_manager-assembly-1.4.0.jar backupindex -z
zk0.example.com:2181/myapp -c collection1 --dir <shareddir>
    java -jar solrcloud_manager-assembly-1.4.0.jar clonecollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1
    java -jar solrcloud_manager-assembly-1.4.0.jar restoreindex -z
zk0.example.com:2181/myapp -c newcollection --dir <shareddir>
--restoreFrom collection1

If you don’t have a shared filesystem, you can still do the copy
collection route:
    java -jar solrcloud_manager-assembly-1.4.0.jar clonecollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1

    java -jar solrcloud_manager-assembly-1.4.0.jar copycollection -z
zk0.example.com:2181/myapp -c newcollection --fromCollection collection1

This creates a new collection with the same settings, (clonecollection)
and triggers a one-shot “replication” into it. (copycollection) Again,
this is just framework for the proper (largely undocumented) Solr API
commands, to work around the lack of a convenient collections-level API
command.

One nice thing about using copy collection is that it can be used to keep
a backup collection up to date, only copying if necessary. Honestly
though, I don’t have as much experience with this use case as KNitin does
in solrcloud-haft, which is why I suggest using an empty collection in the
README right now. If you try that use case with solrcloud_manager, I’d be
interested in your experience. It should work, but you’ll need to disable
the verification with --skipCheck and check manually.


Having said all that though, yes, with your simple use case and small
collection, you can do everything you want with just cp. The easiest way
would be to make a backup copy of your index dir. If you need to restore,
shut down solr, nuke your index dir, and copy the backup in there. You’d
probably need to do this on all nodes at once though, to prevent a
non-leader from coming up and re-syncing with a piece of the index you
hadn’t restored yet.




On 11/21/15, 10:12 PM, "Brian Narsi" <bnarsi70@gmail.com> wrote:

>What are the caveats regarding the copy of a collection?
>
>At this time DIH takes only about 10 minutes. So in case of accidental
>delete we can just re-run the DIH. The reason I am thinking about backup
>is
>just in case records are deleted accidentally and the DIH cannot be run
>because the database is unavailable.
>
>Our collection is simple: 2 nodes - 1 collection - 2 shards with 2
>replicas
>each
>
>So a simple copy (cp command) for both the nodes/shards might work for us?
>How do I restore the data back?
>
>
>
>On Tue, Nov 17, 2015 at 4:56 PM, Jeff Wartes <jwartes@whitepages.com>
>wrote:
>
>>
>> https://github.com/whitepages/solrcloud_manager supports 5.x, and I
>>added
>> some backup/restore functionality similar to SOLR-5750 in the last
>> release.
>> Like SOLR-5750, this backup strategy requires a shared filesystem, but
>> note that unlike SOLR-5750, I haven’t yet added any backup functionality
>> for the contents of ZK. I’m currently working on some parts of that.
>>
>>
>> Making a copy of a collection is supported too, with some caveats.
>>
>>
>> On 11/17/15, 10:20 AM, "Brian Narsi" <bnarsi70@gmail.com> wrote:
>>
>> >Sorry I forgot to mention that we are using SolrCloud 5.1.0.
>> >
>> >
>> >
>> >On Tue, Nov 17, 2015 at 12:09 PM, KNitin <nitin.tnvl@gmail.com> wrote:
>> >
>> >> afaik Data import handler does not offer backups. You can try using
>>the
>> >> replication handler to backup data as you wish to any custom end
>>point.
>> >>
>> >> You can also try out : https://github.com/bloomreach/solrcloud-haft.
>> >>This
>> >> helps backup solr indices across clusters.
>> >>
>> >> On Tue, Nov 17, 2015 at 7:08 AM, Brian Narsi <bnarsi70@gmail.com>
>> wrote:
>> >>
>> >> > I am using Data Import Handler to retrieve data from a database
>>with
>> >> >
>> >> > full-import, clean = true, commit = true and optimize = true
>> >> >
>> >> > This has always worked correctly without any errors.
>> >> >
>> >> > But just to be on the safe side, I am thinking that we should do a
>> >>backup
>> >> > before initiating Data Import Handler. And just in case something
>> >>happens
>> >> > restore the backup.
>> >> >
>> >> > Can backup be done automatically (before initiating Data Import
>> >>Handler)?
>> >> >
>> >> > Thanks
>> >> >
>> >>
>>
>>

Mime
View raw message