lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alain Rogister <alain.rogis...@gmail.com>
Subject stress testing Solr 4.x
Date Fri, 07 Dec 2012 21:07:48 GMT
I am reporting the results of my stress tests against Solr 4.x. As I was
getting many error conditions with 4.0, I switched to the 4.1 trunk in the
hope that some of the issues would be fixed already. Here is my setup :

- Everything running on a single box (2 x 4-core CPUs, 8 GB RAM). I realize
this is not representative of a production environment but it's a fine way
to find out what happens under resource-constrained conditions.
- 3 Solr servers, 3 cores (2 of which are very small, the third one has 410
MB of data)
- single shard
- 3 Zookeeper instances
- HAProxy load balancing requests across Solr servers
- JMeter or ApacheBench running the tests : 5 thread pools of 20 threads
each, sending search requests continuously (no updates)

In nominal conditions, it all works fine i.e. it can process a million
requests, maxing out the CPUs at all time, without experiencing nasty
failures. There are errors in the logs about replication failures though;
they should be benigne in this case as no updates are taking place but it's
hard to tell what is going on exactly. Example :

Dec 07, 2012 7:50:37 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr
exception talking to
http://192.168.0.101:8985/solr/adressage/, failed
org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:166)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Then I simulated various failure scenarios :

- 1 Solr server stop/start
- 2 Solr servers stop/start
- 3 Solr servers stop/start : it seems that in this case, the Solr servers
*cannot* be restarted : more exactly, the restarted server will consider
that it is number 1 out of 4 and wait for the other 3 to come up. The only
way out is to stop it again, then stop all Zookeeper instances *and* clean
up their zkdata directory, start them, then start the Solr servers.

I noticed that these zkdata directory had grown to 200 MB after a while.
What exactly is in there besides the configuration data ? Does it stop
growing ?

Then I tried this :

- kill 1 Zookeeper process
- kill 2 Zookeeper processes
- stop/start 1 Solr server

When doing this, I experienced (many times) situations where the Solr
servers could not reconnect and threw scary exceptions. The only way out
was to restart the whole cluster.

Q : when, if ever, is one supposed to clean up the zkdata directories ?

Here are the errors I found in the logs. It seems that some of them have
been reported in JIRA but 4.1-trunk seems to experience basically the same
issues as 4.0 in my test scenarios.

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=cachede url=http://192.168.0.101:8983/solr
couldn't connect to
http://192.168.0.101:8984/solr/cachede/, counting as success
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error:
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://192.168.0.101:8984/solr/cachede
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: http://192.168.0.101:8983/solr/cachede/: Could not tell a replica
to recover:org.apache.solr.client.solrj.SolrServerException: Server refused
connection at: http://192.168.0.101:8984/solr
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at org.apache.solr.cloud.SyncStrategy$1.run(SyncStrategy.java:293)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://192.168.0.101:8984 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 5 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
... 13 more

Dec 07, 2012 8:03:59 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=adressage url=http://192.168.0.101:8983/solr  got a
404 from http://192.168.0.101:8985/solr/adressage/, counting as success
Dec 07, 2012 8:03:59 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/adressage returned non ok status:404,
message:Not Found
Dec 07, 2012 8:04:00 PM org.apache.solr.update.PeerSync handleResponse
WARNING: PeerSync: core=formabanque url=http://192.168.0.101:8983/solr  got
a 404 from http://192.168.0.101:8985/solr/formabanque/, counting as success
Dec 07, 2012 8:04:00 PM org.apache.solr.common.SolrException log
SEVERE: Sync request error: org.apache.solr.common.SolrException: Server at
http://192.168.0.101:8985/solr/formabanque returned non ok status:404,
message:Not Found

Dec 07, 2012 8:04:32 PM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates

Dec 07, 2012 8:03:58 PM org.apache.solr.common.SolrException log
SEVERE: Error while trying to
recover:org.apache.solr.client.solrj.SolrServerException: Server refused
connection at: http://192.168.0.101:8984/solr/adressage
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
at
org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:182)
at
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:134)
at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:407)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:222)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to
http://192.168.0.101:8984 refused
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
at
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
at
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
at
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
at
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
at
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
... 6 more
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391)
at java.net.Socket.connect(Socket.java:579)
at
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
at
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
... 14 more

Dec 07, 2012 8:03:58 PM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... (0) core=adressage

SEVERE: Error getting leader from zk
org.apache.solr.common.SolrException: Could not get leader props
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:735)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:699)
at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:664)
at org.apache.solr.cloud.ZkController.register(ZkController.java:603)
at org.apache.solr.cloud.ZkController.register(ZkController.java:558)
at org.apache.solr.core.CoreContainer.registerInZk(CoreContainer.java:791)
at org.apache.solr.core.CoreContainer.register(CoreContainer.java:775)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:567)
at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:562)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/adressage/leaders/shard1
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:244)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:241)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:63)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:241)
at org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:713)
... 16 more

Dec 07, 2012 4:39:23 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard:
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:159)
at
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:133)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message