lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernd Fehling (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-10733) Rule-based Replica Placement not working correct
Date Thu, 01 Jun 2017 07:45:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032590#comment-16032590
] 

Bernd Fehling commented on SOLR-10733:
--------------------------------------

Some explanation about Rule.java / ReplicaAssigner.java and what this patch is doing.

After running all parts many many times to understand Rule and ReplicaAssigner this is *roughly*
how it works.
- numShards and replicationFactor build a sorted list of Positions e.g. [shard1:0,shard2:0,shard1:1,shard2:1,...]
- there is a sorted list of LiveNodes
- there is a list with all Rules

It selects the first shard from Positions list, takes the first node from LiveNodes list and
checks the node with his tags against all Rules.
- if the selected node with his tags doesn't pass *all* Rules it is skipped and the next node
is selected
- if the selected node with his tags passes *all* Rules it is assigned to the selected shard
This continues until all shards with their replicas are filled.

Problem 1)
If the selected node under test doesn't pass the Rules it is simply skipped.
If we have a rule without wildcards then the node with his tags (port, node,rack,...) might
fail for this shard but could later on pass the Rules if tested against other Positions. But
this will never happen because the list of LiveNodes has always the same sequence.
Solution here, move the node in the list of LiveNodes to the end of the list so it might match
later on.

Problem 2)
It is possible (as in testPlacement2) that you have many nodes but because of restrictive
Rules you can only assign 2 or 3 of the nodes from LiveNodes to shards. You will run out of
nodes passing the Rules and all nodes failing the Rules end up at the end of LiveNodes list.
This is solved by checking the position in LiveNodes against the number of nodes in LiveNodes.
If position is higher it will start from beginning of LiveNodes.

Problem 3)
If you want to have only 1 replica the rule will be "replica:<2" (as stated in "Rule-based
Replica Placement" of Solr Documentation). 
Because each node is also counted as replica it leads to the situation where a shard has already
assigned one node.
For the next node the test against Rules will be "is the number of replicas less than 2" which
is positive and pass. So the node will be assigned to the shard. 
At the end of all assignments there is a final testing which will verify the result against
all Rules.
But now the same test which passed before "is the number of replicas less than 2" will fail
because this test is done *after* the assignment.
This is solved by decreasing the "NumberOfNodesWithSameTagVal" by one during the phase of
VERIFY.


> Rule-based Replica Placement not working correct
> ------------------------------------------------
>
>                 Key: SOLR-10733
>                 URL: https://issues.apache.org/jira/browse/SOLR-10733
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Rules, SolrCloud
>    Affects Versions: 6.5.1
>            Reporter: Bernd Fehling
>            Assignee: Noble Paul
>         Attachments: SOLR-10733.patch, SOLR-10733.patch
>
>
> A setup of a SolrCloud with 6 nodes on 3 server e.g.:
> {code}
> server1:8983 , server1:7574
> server2:8983 , server2:7574
> server3:8983 , server3:7574
> {code}
> and a command for creating a new collection with rule:
> {code}
> http://localhost:8983/solr/admin/collections?action=CREATE&name=boss&
> collection.configName=boss_configs&numShards=3&replicationFactor=2&
> maxShardsPerNode=1&rule=shard:shard1,replica:<2,port:8983
> {code}
> should create a collection with 3 shards and least a shard1 with two different nodes
running on port 8983.
> {code}
> shard1 --> server_x:8983 ,  server_y:8983
> {code}
> A even more restrictive rule like
> {code}
> rule=shard:shard1,replica:<2,port:8983&rule=shard:shard3,replica:<2,port:7574
> {code}
> should also resolve to a solution because if it really checks all permutations accross
shards/replicas/ports and available nodes it should be able to solve this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message