lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arkadi Colson <ark...@smartbit.be>
Subject Re: Replication in SolrCloud
Date Mon, 03 Dec 2012 10:05:52 GMT
Thanks for the explaination It's clear now...

I expanded the setup to:
4 hosts with 2 shards en 1 replicator for each shard. When I shutdown 
tomcat on solr01-dcg which is the master of shard 1 for both 
collections, the replicator (solr01-gs) seems NOT to takeover.
See logs below.

Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext 
runLeaderProcess
INFO: Running the leader process.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext 
shouldIBeLeader
INFO: Checking if I should try and be the leader.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext 
shouldIBeLeader
INFO: My last published State was Active, it's okay to be the leader.
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext 
runLeaderProcess
INFO: I may be the new leader - try and sync
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.SyncStrategy sync
INFO: Sync replicas to http://solr01-gs:8983/solr/intradesk/
Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url=http://solr01-gs:8983/solr START 
replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100
Dec 3, 2012 9:55:34 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url=http://solr01-gs:8983/solr DONE.  We 
have no versions.  sync failed.
Dec 3, 2012 9:55:34 AM org.apache.solr.common.SolrException log
SEVERE: Sync Failed
Dec 3, 2012 9:55:34 AM org.apache.solr.cloud.ShardLeaderElectionContext 
rejoinLeaderElection
INFO: There is a better leader candidate than us - going back into recovery
Dec 3, 2012 9:55:35 AM org.apache.solr.update.DefaultSolrCoreState 
doRecovery
INFO: Running recovery - first canceling any ongoing recovery
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy run
INFO: Starting recovery process.  core=intradesk 
recoveringAfterStartup=false
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Attempting to PeerSync from http://solr01-dcg:8983/solr/intradesk/ 
core=intradesk - recoveringAfterStartup=false
Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
INFO: PeerSync: core=intradesk url=http://solr01-gs:8983/solr START 
replicas=[http://solr01-dcg:8983/solr/intradesk/] nUpdates=100
Dec 3, 2012 9:55:35 AM org.apache.solr.update.PeerSync sync
WARNING: no frame of reference to tell of we've missed updates
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: PeerSync Recovery was not successful - trying replication. 
core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.client.solrj.impl.HttpClientUtil 
createClient
INFO: Creating new http client, 
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 3, 2012 9:55:35 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. 
core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server 
refused connection at: http://solr01-dcg:8983/solr
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
     at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
     at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
     at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to 
http://solr01-dcg:8983 refused
     at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
     at 
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
     at 
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
     at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
     at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
     ... 4 more
Caused by: java.net.ConnectException: Connection refused
     at java.net.PlainSocketImpl.socketConnect(Native Method)
     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
     at java.net.Socket.connect(Socket.java:529)
     at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
     at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
     ... 12 more

Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=intradesk
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext 
runLeaderProcess
INFO: Running the leader process.
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext 
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 
timeoutin=179999
Dec 3, 2012 9:55:35 AM org.apache.solr.cloud.ShardLeaderElectionContext 
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 
timeoutin=179497
Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext 
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 
timeoutin=178995
Dec 3, 2012 9:55:36 AM org.apache.solr.cloud.ShardLeaderElectionContext 
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1 
timeoutin=178493
Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
INFO: Starting Replication Recovery. core=intradesk
Dec 3, 2012 9:55:37 AM org.apache.solr.client.solrj.impl.HttpClientUtil 
createClient
INFO: Creating new http client, 
config:maxConnections=128&maxConnectionsPerHost=32&followRedirects=false
Dec 3, 2012 9:55:37 AM org.apache.solr.common.SolrException log
SEVERE: Error while trying to recover. 
core=intradesk:org.apache.solr.client.solrj.SolrServerException: Server 
refused connection at: http://solr01-dcg:8983/solr
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:406)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
     at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:199)
     at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:388)
     at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:220)
Caused by: org.apache.http.conn.HttpHostConnectException: Connection to 
http://solr01-dcg:8983 refused
     at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:158)
     at 
org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:150)
     at 
org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:121)
     at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:575)
     at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:425)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
     at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
     at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:352)
     ... 4 more
Caused by: java.net.ConnectException: Connection refused
     at java.net.PlainSocketImpl.socketConnect(Native Method)
     at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
     at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
     at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
     at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
     at java.net.Socket.connect(Socket.java:529)
     at 
org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:123)
     at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:148)
     ... 12 more

Dec 3, 2012 9:55:37 AM org.apache.solr.cloud.RecoveryStrategy doRecovery
SEVERE: Recovery failed - trying again... core=intradesk
...



Any idea why solr stops responding?


On 11/30/2012 04:57 PM, Mark Miller wrote:
> Thanks for all the detailed info!
>
> Yes, that is confusing. One of the sore points we have while supporting both std Solr
and SolrCloud mode.
>
> In SolrCloud, every node is a Master when thinking about std Solr replication. However,
as you see on the cloud page, only one of them is a *leader*. A leader is different than a
master.
>
> Being a Master when it comes to the replication handler simply means you can replicate
the index to other nodes - in SolrCloud we need every node to be capable of doing that. Each
shard only has one leader, but every node in your cluster will be a replication master.
>
> - Mark
>
>
> On Nov 30, 2012, at 10:32 AM, Arkadi Colson <arkadi@smartbit.be> wrote:
>
>> This is my setup for solrCloud 4.0 on Tomcat 7.0.33 and zookeeper 3.4.5
>>
>> hosts:
>> - solr01-dcg (first started)
>> - solr01-gs (second started so becomes replicate)
>>
>> collections:
>> - smsc
>>
>> shards:
>> - mydoc
>>
>> zookeeper:
>> - on solr01-dcg
>> - on solr01-gs
>>
>> SOLR_OPTS="-Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=20000
-DzkHost=solr01-dcg:2181,solr01-gs:2181"
>>
>> solr.xml:
>> <?xml version="1.0" encoding="UTF-8" ?>
>> <solr persistent="true">
>>    <cores adminPath="/admin/cores" zkClientTimeout="20000" hostPort="8983">
>>      <core schema="schema.xml" shard="shard1" instanceDir="/solr/mydoc/" name="mydoc"
config="solrconfig.xml" collection="mydoc"/>
>>    </cores>
>> </solr>
>>
>> I upload the config to zookeeper:
>> java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI
-cmd upconfig -zkhost solr01-dcg:2181,solr01-gs:2181 -confdir /opt/solr/conf -confname smsc
>>
>> Linking the config to the collection:
>> java -classpath .:/usr/local/tomcat/webapps/solr/WEB-INF/lib/* org.apache.solr.cloud.ZkCLI
-cmd linkconfig -collection mydoc -zkhost solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181
-confname smsc
>>
>> cloud on both hosts:
>>
>> <dcddagii.png>
>>
>> solr01-dcg
>>
>> <hhfgdeab.png>
>>
>> solr01-gs:
>>
>> <daafhdef.png>
>> Any idea?
>>
>> Thanks!
>>
>> On 11/30/2012 03:15 PM, Mark Miller wrote:
>>> On Nov 30, 2012, at 5:08 AM, Arkadi Colson <arkadi@smartbit.be>
>>>   wrote:
>>>
>>>
>>>> Hi
>>>>
>>>> I've setup an simple 2 machine cloud with 1 shard, one replicator and 2 collections.Everything
went fine. However when I look at the interface:
>>>> http://localhost:8983/solr/#/coll1/replication
>>>>   is reporting the both machines are master. Did I do something wrong in
my config or isit a report for manual replication configuration? Can someone else check this?
>>>>
>>> How? You don't really give anything to look at :)
>>>
>>>
>>>> Is it poossible to link 2 collections to the same conf in zookeeper?
>>>>
>>>>
>>> Yes, that is no problem.
>>>
>>> - Mark
>>>
>>>
>>>
>>>
>
>

-- 
Met vriendelijke groeten

Arkadi Colson

Smartbit bvba . Hoogstraat 13 . 3670 Meeuwen
T +32 11 64 08 80 . F +32 11 64 08 81


Mime
View raw message