lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Will Miller <wmil...@fbbrands.com>
Subject Ping handler queuing/problem?
Date Mon, 03 Apr 2017 20:46:55 GMT
I have 3 different SolrCloud clusters that share a single set (3) of zookeeper servers. Each
SolrCloud cluster has its own set of collections stored on Zookeeper. Twice in the past week
all 3 clusters have had about a 1 minute period where all requests stopped coming in. Solr
recovers and further requests are fine.

Inbound connections to the SolrCloud clusters are via load balancers. One of the clusters
uses the /Ping request handler, the other two use TCP checks on port 8983 (yes, this is not
ideal). The logs from the 2 clusters that use TCP for health check don't have anything notable
in the logs except a lack of inbound requests. The logs from the cluster that has the health
check to the Ping handler shows something odd at the time of the issue:

I cleaned out all requests and mutl-line errors and sorted alphabetically to line up the timestamps.
You can see the Ping requests come in every 5 seconds then the Ping response for 9 of them
seem to queue up inside of Solr which then causes the VMs to come out of the load balancer
and stop taking requests:

INFO  - 2017-04-03 15:08:03.255; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:03.255; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0
INFO  - 2017-04-03 15:08:08.256; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:13.254; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:18.247; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:23.250; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:28.256; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:33.263; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:38.274; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:43.266; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:48.279; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:53.289; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=10922
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=15914
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=20925
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=25932
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=30938
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=35941
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=40934
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=5909
INFO  - 2017-04-03 15:08:54.188; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=899
INFO  - 2017-04-03 15:08:54.204; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=45948
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:54.450; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.servlet.SolrDispatchFilter;
Unable to write response, client closed connection or we are shutting down
INFO  - 2017-04-03 15:08:59.207; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:08:59.207; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0
INFO  - 2017-04-03 15:09:04.215; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:09:04.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=14
INFO  - 2017-04-03 15:09:09.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} hits=272027 status=0 QTime=0
INFO  - 2017-04-03 15:09:09.229; [products shard1 core_node1 products_shard1_replica1] org.apache.solr.core.SolrCore;
[products_shard1_replica1] webapp=/solr path=/admin/ping params={} status=0 QTime=0

I can provide more log entries before and after showing commits, requests and index updates
if needed. The same pattern can be seen across all 8 VMs though at roughly the same time.
This feels like it could be load balancer or network related but wanted to run it by everyone
to see if this potentially could be something on Solr or Zookeeper side.

The ping request handler is:

<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
   <lst name="invariants">
     <str name="qt">/select</str><!-- handler to delegate to -->
     <str name="q">*:*</str>
   </lst>
</requestHandler>

Thanks,
Will


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message