lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
Date Wed, 22 Apr 2015 07:40:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noble Paul updated SOLR-6220:
-----------------------------
    Description: 
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are
allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system

All configurations are per collection basis. The rules are applied whenever a replica is created
in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection create command
with the snitch param  . eg: snitch=EC2Snitch or snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in
configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically
used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk space available in the node 
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a value that is
passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules
like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes with the given
key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued
parameter  "rule" . The values will be saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      class:“ImplicitSnitch”
    }
  “rules”:[{"cores":"4-"}, 
             {"replica":"1" ,"shard" :"*" ,"node":"*"},
             {"disk":">100"}]
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not conflict with each
other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and an operand
such as {{<}} (less than) or {{>}} (greater than)
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} means ANY.
 default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}.
 
* all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but
values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes according to
affinity. For example, if there is a rule that says {{disk:100+}} , nodes with  more disk
space are given higher preference.  And if the rule is {{disk:100-}} nodes with lesser disk
space will be given priority. If everything else is equal , nodes with fewer cores are given
higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify
fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, if no matches
found , relax that rule". 
rack:*,shard:*,replica:<2~

#Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax
the rule if not possible. This will ensure that if a node does not exist with 100GB disk,
nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node
disk:>100~
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:<3
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:<3

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:<1
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:>20
 replica:*,disk:>20
 disk:>20
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:<5
replica:*,cores:<5
cores:<5
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a parameters called rule
example:
{noformat}
snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3&rule=shard:shard1,replica:,rack:!738}

{noformat}

  was:
h1.Objective
Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster are
allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system

All configurations are per collection basis. The rules are applied whenever a replica is created
in any of the shards in a given collection during

 * collection creation
 * shard splitting
 * add replica
 * createsshard

There are two aspects to how replicas are placed: snitch and placement. 

h2.snitch 
How to identify the tags of nodes. Snitches are configured through collection create command
with the snitch param  . eg: snitch=EC2Snitch or snitch=class:EC2Snitch


h2.ImplicitSnitch 
This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}} in
configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically
used,
tags provided by ImplicitSnitch
# cores :  No:of cores in the node
# disk : Disk space available in the node 
# host : host name of the node
# node: node name 
# D.* : These are values available from systrem propertes. {{D.key}} means a value that is
passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use rules
like {{D.key:expectedVal,shard:*}}

h2.Rules 

This tells how many replicas for a given shard needs to be assigned to nodes with the given
key value pairs. These parameters will be passed on to the collection CREATE api as a multivalued
parameter  "rule" . The values will be saved in the state of the collection as follows
{code:Javascript}
{
 “mycollection”:{
  “snitch”: {
      class:“ImplicitSnitch”
    }
  “rules”:[{"cores":"4-"}, 
             {"replica":"1" ,"shard" :"*" ,"node":"*"},
             {"disk":"1+"}]ol < 2
}
{code}

A rule is specified as a pseudo JSON syntax . which is a map of keys and values
*Each collection can have any number of rules. As long as the rules do not conflict with each
other it should be OK. Or else an error is thrown
* In each rule , shard and replica can be omitted
** default value of  replica is {{\*}} means ANY or you can specify a count and an operand
such as {{+}} or {{-}}
** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} means ANY.
 default value is {{\*\*}} (ANY)
* There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}.
 
* all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing but
values provided by the snitch for each node
* By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 

h3.How are nodes picked up? 
Nodes are not picked up in random. The rules are used to first sort the nodes according to
affinity. For example, if there is a rule that says {{disk:100+}} , nodes with  more disk
space are given higher preference.  And if the rule is {{disk:100-}} nodes with lesser disk
space will be given priority. If everything else is equal , nodes with fewer cores are given
higher priority

h3.Fuzzy match
Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}} to specify
fuzziness

example rule
{noformat}
 #Example requirement "use only one replica of a shard in a host if possible, if no matches
found , relax that rule". 
rack:*,shard:*,replica:~1-

#Another example, assign all replicas to nodes with disk space of 100GB or more,, or relax
the rule if not possible. This will ensure that if a node does not exist with 100GB disk,
nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk node
disk:~100+
{noformat}
Examples:
{noformat}
#in each rack there can be max two replicas of A given shard
 rack:*,shard:*,replica:2-
//in each rack there can be max two replicas of ANY replica
 rack:*,shard:**,replica:2
 rack:*,replica:2-

 #in each node there should be a max one replica of EACH shard
 node:*,shard:*,replica:1-
 #in each node there should be a max one replica of ANY shard
 node:*,shard:**,replica:1-
 node:*,replica:1-
 
#In rack 738 and shard=shard1, there can be a max 0 replica
 rack:738,shard:shard1,replica:0-
 
 #All replicas of shard1 should go to rack 730
 shard:shard1,replica:*,rack:730
 shard:shard1,rack:730

 #all replicas must be created in a node with at least 20GB disk
 replica:*,shard:*,disk:20+
 replica:*,disk:20+
 disk:20+
#All replicas should be created in nodes with less than 5 cores
#In this ANY AND each for shard have same meaning
replica:*,shard:**,cores:5-
replica:*,cores:5-
cores:5-
#one replica of shard1 must go to node 192.168.1.2:8080_solr
node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
#No replica of shard1 should go to rack 738
rack:!738,shard:shard1,replica:*
rack:!738,shard:shard1
#No replica  of ANY shard should go to rack 738
rack:!738,shard:**,replica:*
rack:!738,shard:*
rack:!738
{noformat}



In the collection create API all the placement rules are provided as a parameters called rule
example:
{noformat}
snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:2-,dc:dc3&rule=shard:shard1,replica:,rack:!738}

{noformat}


> Replica placement strategy for solrcloud
> ----------------------------------------
>
>                 Key: SOLR-6220
>                 URL: https://issues.apache.org/jira/browse/SOLR-6220
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>         Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch,
SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster
are allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a replica
is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection create
command with the snitch param  . eg: snitch=EC2Snitch or snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}}
in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically
used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a value that
is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use
rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes with the
given key value pairs. These parameters will be passed on to the collection CREATE api as
a multivalued parameter  "rule" . The values will be saved in the state of the collection
as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>       class:“ImplicitSnitch”
>     }
>   “rules”:[{"cores":"4-"}, 
>              {"replica":"1" ,"shard" :"*" ,"node":"*"},
>              {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and values
> *Each collection can have any number of rules. As long as the rules do not conflict with
each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count and an operand
such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} means
ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}.
 
> * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing
but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes according
to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with  more disk
space are given higher preference.  And if the rule is {{disk:100-}} nodes with lesser disk
space will be given priority. If everything else is equal , nodes with fewer cores are given
higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}}
to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, if no matches
found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or more,, or
relax the rule if not possible. This will ensure that if a node does not exist with 100GB
disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk
node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY shard
>  node:*,shard:**,replica:1-
>  node:*,replica:1-
>  
> #In rack 738 and shard=shard1, there can be a max 0 replica
>  rack:738,shard:shard1,replica:<1
>  
>  #All replicas of shard1 should go to rack 730
>  shard:shard1,replica:*,rack:730
>  shard:shard1,rack:730
>  #all replicas must be created in a node with at least 20GB disk
>  replica:*,shard:*,disk:>20
>  replica:*,disk:>20
>  disk:>20
> #All replicas should be created in nodes with less than 5 cores
> #In this ANY AND each for shard have same meaning
> replica:*,shard:**,cores:<5
> replica:*,cores:<5
> cores:<5
> #one replica of shard1 must go to node 192.168.1.2:8080_solr
> node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
> #No replica of shard1 should go to rack 738
> rack:!738,shard:shard1,replica:*
> rack:!738,shard:shard1
> #No replica  of ANY shard should go to rack 738
> rack:!738,shard:**,replica:*
> rack:!738,shard:*
> rack:!738
> {noformat}
> In the collection create API all the placement rules are provided as a parameters called
rule
> example:
> {noformat}
> snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3&rule=shard:shard1,replica:,rack:!738}

> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message