lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
Date Thu, 23 Feb 2017 08:51:44 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880130#comment-15880130
] 

Shalin Shekhar Mangar commented on SOLR-9835:
---------------------------------------------

bq. I don't think so, switchToNewTlog() is based on commit version at lucene index level (commit.getUserData().get(SolrIndexWriter.COMMIT_COMMAND_VERSION)),
so we will always roll over updates in right way.

I understand that we use the commit version of the latest commit but the copyOverOldUpdates
method only copies updates from the last tlog. If a hard commit happened between the time
that we started replication and finished replication, the last tlog will not have all updates
that should be copied over. Example:
* Leader has the following tlog with these versions:
** tlog0: 1,2,3,4,commit
** tlog1: 5,6
* Replica has tlogs:
** tlog0: 1,2,3,4,5
** replication from leader starts
** user calls explicit commit on replica
** tlog1: 6
** replication completes and we call switchToNewTLog which copies over all versions greater
than 4 from the last tlog
** tlog2: 6

In this case, the update with version 5 is lost and will no longer be available in case this
replica becomes leader.

bq. CDCR is very complex, I don't think we should support CDCR in this new replication mode
now.

Okay, let's create a follow-up issue for this. CDCR is important enough that we must support
it eventually. But I think backup/restore must be supported and tested.

> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch,
SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch,
SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which replicas
start in same initial state and for each input, the input is distributed across replicas so
all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down time, the replica
have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state transfer, which
acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply the update
to IW, other replicas just store the update to UpdateLog (act like replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, updates.
> - Very fast recovery, replicas just have to download the missing segments.
> On CAP point of view, this ticket will trying to promise to end users a distributed systems
:
> - Partition tolerance
> - Weak Consistency for normal query : clusters can serve stale data. This happen when
leader finish a commit and slave is fetching for latest segment. This period can at most {{pollInterval
+ time to fetch latest segment}}.
> - Consistency for RTG : just like original SolrCloud mode
> - Weak Availability : just like original SolrCloud mode. If a leader down, client must
wait until new leader being elected.
> To use this new replication mode, a new collection must be created with an additional
parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE&name=newCollection&numShards=2&replicationFactor=1&liveReplicas=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message