lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Wed, 03 Apr 2013 02:07:23 GMT
I brought the bad one down and back up and it did nothing.  I can clear the
index and try4.2.1. I will save off the logs and see if there is anything
else odd
On Apr 2, 2013 9:13 PM, "Mark Miller" <markrmiller@gmail.com> wrote:

> It would appear it's a bug given what you have said.
>
> Any other exceptions would be useful. Might be best to start tracking in a
> JIRA issue as well.
>
> To fix, I'd bring the behind node down and back again.
>
> Unfortunately, I'm pressed for time, but we really need to get to the
> bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading
> to mirrors now).
>
> - Mark
>
> On Apr 2, 2013, at 7:21 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>
> > Sorry I didn't ask the obvious question.  Is there anything else that I
> > should be looking for here and is this a bug?  I'd be happy to troll
> > through the logs further if more information is needed, just let me know.
> >
> > Also what is the most appropriate mechanism to fix this.  Is it required
> to
> > kill the index that is out of sync and let solr resync things?
> >
> >
> > On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson <jej2003@gmail.com> wrote:
> >
> >> sorry for spamming here....
> >>
> >> shard5-core2 is the instance we're having issues with...
> >>
> >> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> >> SEVERE: shard update error StdNode:
> >>
> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
> :
> >> Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok
> >> status:503, message:Service Unavailable
> >>        at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
> >>        at
> >>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> >>        at
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> >>        at
> >>
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> >>        at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>        at
> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
> >>        at
> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> >>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> >>        at java.lang.Thread.run(Thread.java:662)
> >>
> >>
> >> On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson <jej2003@gmail.com>
> wrote:
> >>
> >>> here is another one that looks interesting
> >>>
> >>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log
> >>> SEVERE: org.apache.solr.common.SolrException: ClusterState says we are
> >>> the leader, but locally we don't think so
> >>>        at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
> >>>        at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
> >>>        at
> >>>
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
> >>>        at
> >>>
> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
> >>>        at
> >>>
> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
> >>>        at
> >>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
> >>>        at
> >>>
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
> >>>        at
> >>>
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >>>        at
> >>>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >>>        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
> >>>        at
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
> >>>        at
> >>>
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
> >>>
> >>>
> >>>
> >>> On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson <jej2003@gmail.com>
> wrote:
> >>>
> >>>> Looking at the master it looks like at some point there were shards
> that
> >>>> went down.  I am seeing things like what is below.
> >>>>
> >>>> NFO: A cluster state change: WatchedEvent state:SyncConnected
> >>>> type:NodeChildrenChanged path:/live_nodes, has occurred - updating...
> (live
> >>>> nodes size: 12)
> >>>> Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3
> >>>> process
> >>>> INFO: Updating live nodes... (9)
> >>>> Apr 2, 2013 8:12:52 PM
> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>> runLeaderProcess
> >>>> INFO: Running the leader process.
> >>>> Apr 2, 2013 8:12:52 PM
> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>> shouldIBeLeader
> >>>> INFO: Checking if I should try and be the leader.
> >>>> Apr 2, 2013 8:12:52 PM
> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>> shouldIBeLeader
> >>>> INFO: My last published State was Active, it's okay to be the leader.
> >>>> Apr 2, 2013 8:12:52 PM
> org.apache.solr.cloud.ShardLeaderElectionContext
> >>>> runLeaderProcess
> >>>> INFO: I may be the new leader - try and sync
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller <markrmiller@gmail.com
> >wrote:
> >>>>
> >>>>> I don't think the versions you are thinking of apply here. Peersync
> >>>>> does not look at that - it looks at version numbers for updates
in
> the
> >>>>> transaction log - it compares the last 100 of them on leader and
> replica.
> >>>>> What it's saying is that the replica seems to have versions that
the
> leader
> >>>>> does not. Have you scanned the logs for any interesting exceptions?
> >>>>>
> >>>>> Did the leader change during the heavy indexing? Did any zk session
> >>>>> timeouts occur?
> >>>>>
> >>>>> - Mark
> >>>>>
> >>>>> On Apr 2, 2013, at 4:52 PM, Jamie Johnson <jej2003@gmail.com>
wrote:
> >>>>>
> >>>>>> I am currently looking at moving our Solr cluster to 4.2 and
> noticed a
> >>>>>> strange issue while testing today.  Specifically the replica
has a
> >>>>> higher
> >>>>>> version than the master which is causing the index to not replicate.
> >>>>>> Because of this the replica has fewer documents than the master.
>  What
> >>>>>> could cause this and how can I resolve it short of taking down
the
> >>>>> index
> >>>>>> and scping the right version in?
> >>>>>>
> >>>>>> MASTER:
> >>>>>> Last Modified:about an hour ago
> >>>>>> Num Docs:164880
> >>>>>> Max Doc:164880
> >>>>>> Deleted Docs:0
> >>>>>> Version:2387
> >>>>>> Segment Count:23
> >>>>>>
> >>>>>> REPLICA:
> >>>>>> Last Modified: about an hour ago
> >>>>>> Num Docs:164773
> >>>>>> Max Doc:164773
> >>>>>> Deleted Docs:0
> >>>>>> Version:3001
> >>>>>> Segment Count:30
> >>>>>>
> >>>>>> in the replicas log it says this:
> >>>>>>
> >>>>>> INFO: Creating new http client,
> >>>>>>
> >>>>>
> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
> >>>>>>
> >>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> >>>>>>
> >>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>> url=http://10.38.33.17:7577/solrSTART replicas=[
> >>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100
> >>>>>>
> >>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
> handleVersions
> >>>>>>
> >>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>> http://10.38.33.17:7577/solr
> >>>>>> Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/
> >>>>>>
> >>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync
> handleVersions
> >>>>>>
> >>>>>> INFO: PeerSync: core=dsc-shard5-core2 url=
> >>>>> http://10.38.33.17:7577/solr  Our
> >>>>>> versions are newer. ourLowThreshold=1431233788792274944
> >>>>>> otherHigh=1431233789440294912
> >>>>>>
> >>>>>> Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync
> >>>>>>
> >>>>>> INFO: PeerSync: core=dsc-shard5-core2
> >>>>>> url=http://10.38.33.17:7577/solrDONE. sync succeeded
> >>>>>>
> >>>>>>
> >>>>>> which again seems to point that it thinks it has a newer version
of
> >>>>> the
> >>>>>> index so it aborts.  This happened while having 10 threads indexing
> >>>>> 10,000
> >>>>>> items writing to a 6 shard (1 replica each) cluster.  Any thoughts
> on
> >>>>> this
> >>>>>> or what I should look for would be appreciated.
> >>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message