lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-9835) Create another replication mode for SolrCloud
Date Tue, 03 Jan 2017 02:23:58 GMT

     [ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Cao Manh Dat updated SOLR-9835:
-------------------------------
    Attachment: SOLR-9835.patch

Updated patch for this issues, the changes are pretty solid now. 

The main difference between {{onlyLeaderIndexes}} mode and current mode is in {{onlyLeaderIndexes}}
mode we can serve stale data. So I modified TestInjection to make replicas wait for indexFetcher
finish upon receiving commit request, then we can reuse existing tests for SolrCloud to test
for {{onlyLeaderIndexes}} mode. These are failed tests (5/206 tests of SolrCloud)
- CdcrVersionReplicationTest, ShardSplitTest, SyncSliceTest: we can notify to users that {{onlyLeaderIndexes}}
hasn't  supported for CDCR, ShardSplit and SyncSlice yet.
- LeaderFailureAfterFreshStartTest, PeerSyncReplicationTest : we don't support peersync yet.
I think all these tests can be ignored for this issue, we can tackle these failed on other
tickets.

I also run the jepsen tests for this mode ( https://lucidworks.com/blog/2014/12/10/call-maybe-solrcloud-jepsen-flaky-networks/
). The tests are passed so I think we can pretty sure that new mode is consistency and partition
tolerance.

> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which replicas
start in same initial state and for each input, the input is distributed across replicas so
all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down time, the replica
have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state transfer, which
acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply the update
to IW, other replicas just store the update to UpdateLog (act like replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, updates.
> - Very fast recovery, replicas just have to download the missing segments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message