lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "RAHAT BHALLA (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-10987) Solr Cloud overseer node becomes unreachable. Issue Started Recently
Date Fri, 07 Jul 2017 00:31:00 GMT

     [ https://issues.apache.org/jira/browse/SOLR-10987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

RAHAT BHALLA updated SOLR-10987:
--------------------------------
    Attachment: solr.zip

These are the logs from our outage 

> Solr Cloud overseer node becomes unreachable. Issue Started Recently
> --------------------------------------------------------------------
>
>                 Key: SOLR-10987
>                 URL: https://issues.apache.org/jira/browse/SOLR-10987
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 6.1
>         Environment: *The following is the usage on each of the Solr Nodes:*
> Tasks: 254 total,   1 running, 252 sleeping,   0 stopped,   1 zombie
> %Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.2 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
> KiB Mem : 20392276 total,  4169296 free,  2917012 used, 13305968 buff/cache
> KiB Swap:  5111804 total,  5111636 free,      168 used. 16058184 avail Mem
>   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
> 21250 solr      20   0 23.599g 1.184g 228440 S   2.0  6.1  59:55.91 java
> *Solr is running on 5 machines with similar configuration:*
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                4
> On-line CPU(s) list:   0-3
> Thread(s) per core:    1
> Core(s) per socket:    2
> Socket(s):             2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 62
> Model name:            Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
> Stepping:              4
> CPU MHz:               2799.033
> BogoMIPS:              5600.00
> Hypervisor vendor:     VMware
> Virtualization type:   full
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              25600K
> NUMA node0 CPU(s):     0-3
>            Reporter: RAHAT BHALLA
>            Priority: Blocker
>              Labels: assistance, critical, customer, impacting, issue, need, production
>         Attachments: solr.zip
>
>
> We host a Solr Cloud of 5 Nodes for Solr Instances and 3 Zookeeper nodes to maintain
the cloud. We have over 70 million docs spread across 13 collections with 40K more documents
being added every day almost near time within spans of 5 to 6 minutes.
> The System was working as expected and as required for th elast 7 months until suddenly
we saw the following exception and all of our instances went offline. We restarted the instances
and the cloud ran smoothly for three days before it came crashing down again.
> *Exception It gives before it goes down is as follows:*
> 3542285 ERROR (OverseerCollectionConfigSetProcessor-98221003671470081-prod-solr-node01:9080_solr-n_0000000106)
[   ] o.a.s.c.OverseerTaskProcessor
> org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
for /overseer_elect/leader
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
>         at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>         at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:348)
>         at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:345)
>         at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
>         at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:345)
>         at org.apache.solr.cloud.OverseerTaskProcessor.amILeader(OverseerTaskProcessor.java:384)
>         at org.apache.solr.cloud.OverseerTaskProcessor.run(OverseerTaskProcessor.java:191)
>         at java.lang.Thread.run(Unknown Source)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message