lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
Date Fri, 09 Dec 2016 15:05:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15735547#comment-15735547
] 

Cao Manh Dat commented on SOLR-9835:
------------------------------------

Here are the detail design that I and Shalin have been working for recently

h2. Overview
The current replication mechanism of SolrCloud is called state machine, which replicas start
in same initial state and for each input, the input is distributed across replicas so all
replicas will end up with same next state.

But this type of replication has some drawbacks :
- The indexing and commits have to run on all replicas
- Slow recovery, because if replica miss more than N updates on its down time, the replica
has to (usually) download the entire index from its leader.

So we introduce another replication for SolrCloud called state transfer, which acts like master/slave
replication. Basically:
- Leader distribute the update to other replicas, but only the leader applies the update to
IndexWriter; other replicas just store the update to UpdateLog (act like replication)
- Replicas frequently polling the latest segments from leader.

Pros:
- Lightweight for indexing, because only leader are running the updates and commits
- Very fast recovery. If a replica fails, it just has to download newest segments, instead
of re-downloading entire index.

Cons :
- Leader can become a hotspot when there are many replicas (we can optimize later)
- Longer turnaround time compared to current NRT replication

h2. Commit
When we commit, we write the version of update command into the commit data in addition to
the timestamp that is written today.

h2. Update
# When a replica receive an update request it will forward the update request to 
#* Correct leader ( in case of add document, delete by id )
#* All leaders ( in case of delete by query, commit )
# Leader assigns version to update as it does today, writes to update log and applies the
update to IndexWriter
# Leader distributes the update to its replicas
# When replica receives an update, it writes the update to update log and return successfully
# Leader return successful

h2. Periodic Segment Replication
Replica will poll the leader for the latest commit point generation and version and compare
against the latest commit point locally. If it is the same, then replication is successful
and nothing needs to be done. If not, then:
# Replica downloads all files since the local commit point from the leader
# Installs into the index, synchronize on the update log, close the old tlog, create a new
one and copy over all records from the old tlog which have version greater than the latest
commit point’s version. This is to ensure that any update which hasn’t made it to the
index is preserved in the current tlog. This might lead to duplication of updates in the previous
and the current tlogs but that is okay.
# Replica re-opens the searcher

For example:
Leader has the following tlogs
- TLog1 has versions 1,2,3,commit
- TLog2 has versions 4,5

Before segment replication, the replica have the following tlogs:
- TLog1 - 1,2,3,4,commit

After segment replication, the replica will have the following tlogs:
- TLog1 - 1,2,3,4,commit
- TLog2 - 4,5


During this process, the replica does not need to be put into ‘recovery’ state. It continues
to be ‘active’ and participate in indexing and searches.

h2. Replica recovery
# Replica puts itself in ‘recovery’ state.
# Replica compares its latest commit point against the leader
# If the index is same as leader, then it performs ‘peersync’ with the leader but only
writes the peersync updates to its update log
# If the index is not the same as leader or if peersync fails, the replica:
#* Puts its update log in “buffering” state
#* Issues a hard commit to the leader
#* Copies the index commit points from the leader that do not exist locally
#* Publishes itself as ‘active’
# The “buffering” state in the above steps ensures that any updates that haven’t been
committed to the leader are also present/replicated to the replica’s current transaction
log

With respect to the current recovery strategy in Solr, we need only one change which is to
check the index version of leader vs replica before we attempt a peersync.

h2. Leader Election
When a leader dies, a candidate replica will become a new leader. The leader election algorithm
remains mostly the same except that after the “sync” step, the leader candidate will replay
its transaction log before publishing itself as the leader.

h2. Collection property to switch replication scheme
A new property called “onlyLeaderIndexes” will be added to the collection. Any collection
that has this property set to true will only index to the elected leader and the rest of the
replicas will only fetch index segments from the leader as described above in the document.
This property must be set during collection creation. It will default to “false”. Existing
collections cannot be switched to using the new replication scheme. Future work can attempt
to fix that.

h2. FAQ

Q: What happens on a soft-commit?
A: soft-commit is nothing but a searcher opened using the IndexWriter which flushes a new
segment to the disk but does not commit it. In this case, the newly written segment not being
part of a commit point, is not replicated at all. Effectively, soft commits are not useful
in this new design currently. Future work may attempt to solve this problem. This same answer
applies to searcher re-opens due to real-time-get.


> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>
> The current replication mechanism of SolrCloud is called state machine, which replicas
start in same initial state and for each input, the input is distributed across replicas so
all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down time, the replica
have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state transfer, which
acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply the update
to IW, other replicas just store the update to UpdateLog (act like replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message