lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doss <itsmed...@gmail.com>
Subject SOLR Cloud: 1500+ threads are in TIMED_WAITING status
Date Wed, 04 Apr 2018 05:46:22 GMT
We have SOLR(7.0.1) cloud 3 VM Linux instances wit 4 CPU, 90 GB RAM with
zookeeper (3.4.11) ensemble running on the same machines. We have 130 cores
of overall size of 45GB. No Sharding, almost all VMs has the same copy of
data. These nodes are under LB.

Index Config:
=============

<ramBufferSizeMB>300</ramBufferSizeMB>
<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
       <int name="maxMergeAtOnce">30</int>
       <int name="maxMergeAtOnceExplicit">100</int>
       <double name="segmentsPerTier">30.0</double>
</mergePolicyFactory>

<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
       <int name="maxMergeCount">18</int>
       <int name="maxThreadCount">6</int>
</mergeScheduler>

Commit Configs:
===============
<autoCommit>
       <maxTime>${solr.autoCommit.maxTime:600000}</maxTime>
       <openSearcher>false</openSearcher>
</autoCommit>

<autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:60000}</maxTime>
</autoSoftCommit>


We do 3500 Insert / Updates per second spread across all 130 cores, We yet
to start using selects effectively.

The problem what we are facing is at times suddenly the thread count
increase heavily which results SOLR non responsive or throwing 503 response
for client (PHP HTTP CURL) requests.

Today 04-04-2018 the thread dump shows that the peak went upto 13000+

Please hlep me in fixing this issue. Thanks!


Sample Threads:
===============

1.updateExecutor-2-thread-25746-processing-http:////
172.10.2.19:8983//solr//profileviews x:profileviews r:core_node2
n:172.10.2.18:8983_solr s:shard1 c:profileviews", "state":"TIMED_WAITING",
"lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@297be1d5",
"cpuTime":"162.4371ms", "userTime":"120.0000ms",
"stackTrace":["sun.misc.Unsafe.park(Native Method)",
"java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",
"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",

2. ERROR true HttpSolrCall
null:org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
Async exception during distributed update: Error from server at
172.10.2.18:8983/solr/profileviews: Server Error request:
http://172.10.2.18:8983/solr/profileviews/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F172.10.2.19%3A8983%2Fsolr%2Fprofileviews%2F&wt=javabin&version=2
Remote error message: empty String

3. So Many Threads like:
"name":"qtp959447386-21",
        "state":"TIMED_WAITING",

"lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@6a1a2bf4
",
        "cpuTime":"4522.0837ms",
        "userTime":"3770.0000ms",
        "stackTrace":["sun.misc.Unsafe.park(Native Method)",

"java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)",

"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)",

"org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:392)",

"org.eclipse.jetty.util.thread.QueuedThreadPool.idleJobPoll(QueuedThreadPool.java:563)",

"org.eclipse.jetty.util.thread.QueuedThreadPool.access$800(QueuedThreadPool.java:48)",

"org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)",
          "java.lang.Thread.run(Thread.java:748)"

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message