lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mosh (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-12993) Split the state.json into 2. a small frequently modified data + a large unmodified data
Date Wed, 17 Apr 2019 11:06:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-12993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16819976#comment-16819976
] 

mosh edited comment on SOLR-12993 at 4/17/19 11:05 AM:
-------------------------------------------------------

{quote}or alternately we can just add this data (status, leader) to the LIR term files . That
way , we don't need to create any new files
{quote}
ZkShardTerms(class that generates LIR files) resides in solr-core, while ZkStateReader is
in solrJ.
 Since this proposal is to split state.json, there would be no way to find out which replica
is the leader,
 since this information will reside inside the LIR term files.

I propose two possible forms of action:
 # Move ZkShardTerms to solrJ, combining LIR terms, shard state status and leader.
 # Create new files as proposed by [~noble.paul], which will contain a small subset of the
split information.

[~noble.paul], [~gus_heck],
 WDYT?


was (Author: moshebla):
{quote}or alternately we can just add this data (status, leader) to the LIR term files . That
way , we don't need to create any new files
{quote}
ZkShardTerms(class that generates LIR files) resides in solr-core, while ZkStateReader is
in solrJ.
 Since this proposal is to split state.json, there would be no way to find out which replica
is the leader,
 since this information will reside inside the LIR term files.

I propose two possible forms of action:
 # Move ZkShardTerms to solrJ, and combine LIR terms
 # Create new files as proposed by [~noble.paul], which will contain a small subset of the
split information.

[~noble.paul], [~gus_heck],
 WDYT?

> Split the state.json into 2. a small frequently modified data + a large unmodified data
> ---------------------------------------------------------------------------------------
>
>                 Key: SOLR-12993
>                 URL: https://issues.apache.org/jira/browse/SOLR-12993
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Noble Paul
>            Priority: Major
>
> This a just a proposal to minimize the ZK load and improve scalability of very large
clusters.
> Every time a small state change occurs for a collection/replica the following file needs
to be updated + read * n times (where n = no of replicas for this collection ). The proposal
is to split the main file into 2.
> {code}
> {"gettingstarted":{
>     "pullReplicas":"0",
>     "replicationFactor":"2",
>     "router":{"name":"compositeId"},
>     "maxShardsPerNode":"-1",
>     "autoAddReplicas":"false",
>     "nrtReplicas":"2",
>     "tlogReplicas":"0",
>     "shards":{
>       "shard1":{
>         "range":"80000000-ffffffff",
>       
>         "replicas":{
>           "core_node3":{
>             "core":"gettingstarted_shard1_replica_n1",
>             "base_url":"http://10.0.0.80:8983/solr",
>             "node_name":"10.0.0.80:8983_solr",
>             "state":"active",
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"},
>           "core_node5":{
>             "core":"gettingstarted_shard1_replica_n2",
>             "base_url":"http://10.0.0.80:7574/solr",
>             "node_name":"10.0.0.80:7574_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false"}}},
>       "shard2":{
>         "range":"0-7fffffff",
>         "state":"active",
>         "replicas":{
>           "core_node7":{
>             "core":"gettingstarted_shard2_replica_n4",
>             "base_url":"http://10.0.0.80:7574/solr",
>             "node_name":"10.0.0.80:7574_solr",
>            
>             "type":"NRT",
>             "force_set_state":"false"},
>           "core_node8":{
>             "core":"gettingstarted_shard2_replica_n6",
>             "base_url":"http://10.0.0.80:8983/solr",
>             "node_name":"10.0.0.80:8983_solr",
>          
>             "type":"NRT",
>             "force_set_state":"false",
>             "leader":"true"}}}}}}
> {code}
> another file {{status.json}} which is frequently updated and small.
> {code}
> {
>     "shard1": {
>       "state": "ACTIVE",
>       "core_node3": {"state": "active", "leader" : true},
>       "core_node5": {"state": "active"}
>     },
>     "shard2": {
>       "state": "active",
>       "core_node7": {"state": "active"},
>       "core_node8": {"state": "active", "leader" : true}}
>   }
> {code}
> Here the size of the file is roughly one tenth of the other file. This leads to a dramatic
reduction in the amount of data written/read to/from ZK.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message