lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Noble Paul (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-6220) Replica placement strategy for solrcloud
Date Tue, 28 Apr 2015 15:40:07 GMT

     [ https://issues.apache.org/jira/browse/SOLR-6220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Noble Paul updated SOLR-6220:
-----------------------------
    Attachment:     (was: SOLR-6220.patch)

> Replica placement strategy for solrcloud
> ----------------------------------------
>
>                 Key: SOLR-6220
>                 URL: https://issues.apache.org/jira/browse/SOLR-6220
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Noble Paul
>            Assignee: Noble Paul
>         Attachments: SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch,
SOLR-6220.patch, SOLR-6220.patch, SOLR-6220.patch
>
>
> h1.Objective
> Most cloud based systems allow to specify rules on how the replicas/nodes of a cluster
are allocated . Solr should have a flexible mechanism through which we should be able to control
allocation of replicas or later change it to suit the needs of the system
> All configurations are per collection basis. The rules are applied whenever a replica
is created in any of the shards in a given collection during
>  * collection creation
>  * shard splitting
>  * add replica
>  * createsshard
> There are two aspects to how replicas are placed: snitch and placement. 
> h2.snitch 
> How to identify the tags of nodes. Snitches are configured through collection create
command with the snitch param  . eg: snitch=EC2Snitch or snitch=class:EC2Snitch
> h2.ImplicitSnitch 
> This is shipped by default with Solr. user does not need to specify {{ImplicitSnitch}}
in configuration. If the tags known to ImplicitSnitch are present in the rules , it is automatically
used,
> tags provided by ImplicitSnitch
> # cores :  No:of cores in the node
> # disk : Disk space available in the node 
> # host : host name of the node
> # node: node name 
> # D.* : These are values available from systrem propertes. {{D.key}} means a value that
is passed to the node as {{-Dkey=keyValue}} during the node startup. It is possible to use
rules like {{D.key:expectedVal,shard:*}}
> h2.Rules 
> This tells how many replicas for a given shard needs to be assigned to nodes with the
given key value pairs. These parameters will be passed on to the collection CREATE api as
a multivalued parameter  "rule" . The values will be saved in the state of the collection
as follows
> {code:Javascript}
> {
>  “mycollection”:{
>   “snitch”: {
>       class:“ImplicitSnitch”
>     }
>   “rules”:[{"cores":"4-"}, 
>              {"replica":"1" ,"shard" :"*" ,"node":"*"},
>              {"disk":">100"}]
> }
> {code}
> A rule is specified as a pseudo JSON syntax . which is a map of keys and values
> *Each collection can have any number of rules. As long as the rules do not conflict with
each other it should be OK. Or else an error is thrown
> * In each rule , shard and replica can be omitted
> ** default value of  replica is {{\*}} means ANY or you can specify a count and an operand
such as {{<}} (less than) or {{>}} (greater than)
> ** and the value of shard can be a shard name or  {{\*}} means EACH  or {{**}} means
ANY.  default value is {{\*\*}} (ANY)
> * There should be exactly one extra condition in a rule other than {{shard}} and {{replica}}.
 
> * all keys other than {{shard}} and {{replica}} are called tags and the tags are nothing
but values provided by the snitch for each node
> * By default certain tags such as {{node}}, {{host}}, {{port}} are provided by the system
implicitly 
> h3.How are nodes picked up? 
> Nodes are not picked up in random. The rules are used to first sort the nodes according
to affinity. For example, if there is a rule that says {{disk:100+}} , nodes with  more disk
space are given higher preference.  And if the rule is {{disk:100-}} nodes with lesser disk
space will be given priority. If everything else is equal , nodes with fewer cores are given
higher priority
> h3.Fuzzy match
> Fuzzy match can be applied when strict matches fail .The values can be prefixed {{~}}
to specify fuzziness
> example rule
> {noformat}
>  #Example requirement "use only one replica of a shard in a host if possible, if no matches
found , relax that rule". 
> rack:*,shard:*,replica:<2~
> #Another example, assign all replicas to nodes with disk space of 100GB or more,, or
relax the rule if not possible. This will ensure that if a node does not exist with 100GB
disk, nodes are picked up the order of size say a 85GB node would be picked up over 80GB disk
node
> disk:>100~
> {noformat}
> Examples:
> {noformat}
> #in each rack there can be max two replicas of A given shard
>  rack:*,shard:*,replica:<3
> //in each rack there can be max two replicas of ANY replica
>  rack:*,shard:**,replica:2
>  rack:*,replica:<3
>  #in each node there should be a max one replica of EACH shard
>  node:*,shard:*,replica:1-
>  #in each node there should be a max one replica of ANY shard
>  node:*,shard:**,replica:1-
>  node:*,replica:1-
>  
> #In rack 738 and shard=shard1, there can be a max 0 replica
>  rack:738,shard:shard1,replica:<1
>  
>  #All replicas of shard1 should go to rack 730
>  shard:shard1,replica:*,rack:730
>  shard:shard1,rack:730
>  #all replicas must be created in a node with at least 20GB disk
>  replica:*,shard:*,disk:>20
>  replica:*,disk:>20
>  disk:>20
> #All replicas should be created in nodes with less than 5 cores
> #In this ANY AND each for shard have same meaning
> replica:*,shard:**,cores:<5
> replica:*,cores:<5
> cores:<5
> #one replica of shard1 must go to node 192.168.1.2:8080_solr
> node:”192.168.1.2:8080_solr”, shard:shard1, replica:1 
> #No replica of shard1 should go to rack 738
> rack:!738,shard:shard1,replica:*
> rack:!738,shard:shard1
> #No replica  of ANY shard should go to rack 738
> rack:!738,shard:**,replica:*
> rack:!738,shard:*
> rack:!738
> {noformat}
> In the collection create API all the placement rules are provided as a parameters called
rule
> example:
> {noformat}
> snitch=EC2Snitch&rule=shard:*,replica:1,dc:dc1&rule=shard:*,replica:<2,dc:dc3&rule=shard:shard1,replica:,rack:!738}

> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message