lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-9835) Create another replication mode for SolrCloud
Date Tue, 10 Jan 2017 19:29:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-9835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815930#comment-15815930
] 

Tomás Fernández Löbbe commented on SOLR-9835:
---------------------------------------------

Great idea! just took a quick look at the patch to understand this better. I have a couple
of questions/comments, I know this is work in progress, so feel free to disregard any of my
comments if you are working on them:

{code}
onlyLeaderIndexes = zkStateReader.getClusterState().getCollection(collection).getLiveReplicas()
== 1;
{code}
Maybe add a method to DocCollection like {{isOnlyLeaderIndexes()}} (or choose other name)?
I understand why you did this, but this code is repeated many times, maybe can be improved
for now.

{code}
private Map<String, ReplicateFromLeader> replicateFromLeaders = new HashMap<>();
{code}
Does this need to be synchronized?

{code}
-  private final String masterUrl;
+  private String masterUrl;
{code}
should {{masterUrl}} now be volatile?

{code}
+  public static boolean waitForInSyncWithLeader(SolrCore core, Replica leaderReplica) throws
InterruptedException {
+    if (waitForReplicasInSync == null) return true;
+
+    Pair<Boolean,Integer> pair = parseValue(waitForReplicasInSync);
+    boolean enabled = pair.first();
+    if (!enabled) return true;
+
+    Thread.sleep(1000);
+    HttpSolrClient leaderClient = new HttpSolrClient.Builder(leaderReplica.getCoreUrl()).build();
+    long leaderVersion = -1;
+    String localVersion = null;
+    try {
+      for (int i = 0; i < pair.second(); i++) {
+        if (core.isClosed()) return true;
+        ModifiableSolrParams params = new ModifiableSolrParams();
+        params.set(CommonParams.QT, ReplicationHandler.PATH);
+        params.set(COMMAND, CMD_DETAILS);
+
+        NamedList<Object> response = leaderClient.request(new QueryRequest(params));
+        leaderVersion = (long) ((NamedList)response.get("details")).get("indexVersion");
+
+        localVersion = core.getDeletionPolicy().getLatestCommit().getUserData().get(SolrIndexWriter.COMMIT_TIME_MSEC_KEY);
+        if (localVersion == null && leaderVersion == 0) return true;
+
+        if (localVersion != null && Long.parseLong(localVersion) == leaderVersion)
{
+          return true;
+        } else {
+          Thread.sleep(500);
+        }
+      }
+
+    } catch (Exception e) {
+      log.error("Exception when wait for replicas in sync with master");
+    } finally {
+      try {
+        if (leaderClient != null) leaderClient.close();
+      } catch (IOException e) {
+        e.printStackTrace();
+      }
+    }
+
+    return false;
+  }

{code}
In many cases in the tests the leader will change before the replication happens, right? Does
it make sense to discover the leader inside of the loop? Also, is there a way to remove that
Thread.sleep(1000) at the beginning? This code will be called very frequently in tests.

> Create another replication mode for SolrCloud
> ---------------------------------------------
>
>                 Key: SOLR-9835
>                 URL: https://issues.apache.org/jira/browse/SOLR-9835
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Shalin Shekhar Mangar
>         Attachments: SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch, SOLR-9835.patch,
SOLR-9835.patch, SOLR-9835.patch
>
>
> The current replication mechanism of SolrCloud is called state machine, which replicas
start in same initial state and for each input, the input is distributed across replicas so
all replicas will end up with same next state. 
> But this type of replication have some drawbacks
> - The commit (which costly) have to run on all replicas
> - Slow recovery, because if replica miss more than N updates on its down time, the replica
have to download entire index from its leader.
> So we create create another replication mode for SolrCloud called state transfer, which
acts like master/slave replication. In basically
> - Leader distribute the update to other replicas, but the leader only apply the update
to IW, other replicas just store the update to UpdateLog (act like replication).
> - Replicas frequently polling the latest segments from leader.
> Pros:
> - Lightweight for indexing, because only leader are running the commit, updates.
> - Very fast recovery, replicas just have to download the missing segments.
> To use this new replication mode, a new collection must be created with an additional
parameter {{liveReplicas=1}}
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE&name=newCollection&numShards=2&replicationFactor=1&liveReplicas=1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message