lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hrishikesh Gadre (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-9038) Ability to create/delete/list snapshots for a solr collection
Date Mon, 25 Apr 2016 19:41:12 GMT

     [ https://issues.apache.org/jira/browse/SOLR-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hrishikesh Gadre updated SOLR-9038:
-----------------------------------
    Description: 
Currently work is under-way to implement backup/restore API for Solr cloud (SOLR-5750). SOLR-5750
is about providing an ability to "copy" index files and collection metadata to a configurable
location. 

In addition to this, we should also provide a facility to create "named" snapshots for Solr
collection. Here by "snapshot" I mean configuring the underlying Lucene IndexDeletionPolicy
to not delete a specific commit point (e.g. using PersistentSnapshotIndexDeletionPolicy).
This should not be confused with SOLR-5340 which implements core level "backup" functionality.

The primary motivation of this feature is to decouple recording/preserving a known consistent
state of a collection from actually "copying" the relevant files to a physically separate
location. This decoupling have number of advantages
- We can use specialized data-copying tools for transferring Solr index files. e.g. in Hadoop
environment, typically [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html] tool is
used to copy files from one location to other. This tool provides various options to configure
degree of parallelism, bandwidth usage as well as integration with different types and versions
of file systems (e.g. AWS S3, Azure Blob store etc.)
- This separation of concern would also help Solr to focus on the key functionality (i.e.
querying and indexing) while delegating the copy operation to the tools built for that purpose.
- Users can decide if/when to copy the data files as against creating a snapshot. e.g. a user
may want to create a snapshot of a collection before making an experimental change (e.g. updating/deleting
docs, schema change etc.). If the experiment is successful, he can delete the snapshot (without
having to copy the files). If the experiment is failed, then he can copy the files associated
with the snapshot and restore.

Note that Apache Blur project is also providing a similar feature [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]

  was:
Currently work is under-way to implement backup/restore API for Solr cloud (SOLR-5750). SOLR-5750
is about providing an ability to "copy" index files and collection metadata to a configurable
location. 

In addition to this, we should also provide a facility to create "named" snapshots for Solr
collection. Here by "snapshot" I mean configuring the underlying Lucene IndexDeletionPolicy
to not delete a specific commit point (e.g. using PersistentSnapshotIndexDeletionPolicy).
This should not be confused with SOLR-5340 which implements core level "backup" functionality.

The primary motivation of this feature is to decouple recording/preserving a known consistent
state of a collection from actually "copying" the relevant files to a physically separate
location. This decoupling have number of advantages
- We can use specialized data-copying tools for transferring Solr index files. e.g. in Hadoop
environment, typically [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html] tool is
used to copy files from one location to other. This tool provides various options to configure
degree of parallelism, bandwidth usage as well as integration with different types and versions
of file systems (e.g. AWS S3, Azure Blob store etc.)
- This separation of concern would also help Solr to focus on the key functionality (i.e.
querying and indexing) while delegating the copy operation to the tools built for that purpose.
- Users can decide if/when to copy the data files as against creating a snapshot. e.g. a user
may want to create a snapshot of a collection before making an experimental change (e.g. updating/deleting
docs, schema change etc.). If the experiment is successful, he can delete the snapshot (without
having to copy the files). If the experiment is failed, then he can copy the files associated
with the snapshot and restore from the snapshot.

Note that Apache Blur project is also providing a similar feature [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]


> Ability to create/delete/list snapshots for a solr collection
> -------------------------------------------------------------
>
>                 Key: SOLR-9038
>                 URL: https://issues.apache.org/jira/browse/SOLR-9038
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Hrishikesh Gadre
>
> Currently work is under-way to implement backup/restore API for Solr cloud (SOLR-5750).
SOLR-5750 is about providing an ability to "copy" index files and collection metadata to a
configurable location. 
> In addition to this, we should also provide a facility to create "named" snapshots for
Solr collection. Here by "snapshot" I mean configuring the underlying Lucene IndexDeletionPolicy
to not delete a specific commit point (e.g. using PersistentSnapshotIndexDeletionPolicy).
This should not be confused with SOLR-5340 which implements core level "backup" functionality.
> The primary motivation of this feature is to decouple recording/preserving a known consistent
state of a collection from actually "copying" the relevant files to a physically separate
location. This decoupling have number of advantages
> - We can use specialized data-copying tools for transferring Solr index files. e.g. in
Hadoop environment, typically [distcp|https://hadoop.apache.org/docs/r1.2.1/distcp2.html]
tool is used to copy files from one location to other. This tool provides various options
to configure degree of parallelism, bandwidth usage as well as integration with different
types and versions of file systems (e.g. AWS S3, Azure Blob store etc.)
> - This separation of concern would also help Solr to focus on the key functionality (i.e.
querying and indexing) while delegating the copy operation to the tools built for that purpose.
> - Users can decide if/when to copy the data files as against creating a snapshot. e.g.
a user may want to create a snapshot of a collection before making an experimental change
(e.g. updating/deleting docs, schema change etc.). If the experiment is successful, he can
delete the snapshot (without having to copy the files). If the experiment is failed, then
he can copy the files associated with the snapshot and restore.
> Note that Apache Blur project is also providing a similar feature [BLUR-132|https://issues.apache.org/jira/browse/BLUR-132]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message