lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jamie Johnson <jej2...@gmail.com>
Subject Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Date Wed, 03 Apr 2013 19:42:55 GMT
answered my own question, it now says compositeId.  What is problematic
though is that in addition to my shards (which are say jamie-shard1) I see
the solr created shards (shard1).  I assume that these were created because
of the numShards param.  Is there no way to specify the names of these
shards?


On Wed, Apr 3, 2013 at 3:25 PM, Jamie Johnson <jej2003@gmail.com> wrote:

> ah interesting....so I need to specify num shards, blow out zk and then
> try this again to see if things work properly now.  What is really strange
> is that for the most part things have worked right and on 4.2.1 I have
> 600,000 items indexed with no duplicates.  In any event I will specify num
> shards clear out zk and begin again.  If this works properly what should
> the router type be?
>
>
> On Wed, Apr 3, 2013 at 3:14 PM, Mark Miller <markrmiller@gmail.com> wrote:
>
>> If you don't specify numShards after 4.1, you get an implicit doc router
>> and it's up to you to distribute updates. In the past, partitioning was
>> done on the fly - but for shard splitting and perhaps other features, we
>> now divvy up the hash range up front based on numShards and store it in
>> ZooKeeper. No numShards is now how you take complete control of updates
>> yourself.
>>
>> - Mark
>>
>> On Apr 3, 2013, at 2:57 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>>
>> > The router says "implicit".  I did start from a blank zk state but
>> perhaps
>> > I missed one of the ZkCLI commands?  One of my shards from the
>> > clusterstate.json is shown below.  What is the process that should be
>> done
>> > to bootstrap a cluster other than the ZkCLI commands I listed above?  My
>> > process right now is run those ZkCLI commands and then start solr on
>> all of
>> > the instances with a command like this
>> >
>> > java -server -Dshard=shard5 -DcoreName=shard5-core1
>> > -Dsolr.data.dir=/solr/data/shard5-core1
>> -Dcollection.configName=solr-conf
>> > -Dcollection=collection1 -DzkHost=so-zoo1:2181,so-zoo2:2181,so-zoo3:2181
>> > -Djetty.port=7575 -DhostPort=7575 -jar start.jar
>> >
>> > I feel like maybe I'm missing a step.
>> >
>> > "shard5":{
>> >        "state":"active",
>> >        "replicas":{
>> >          "10.38.33.16:7575_solr_shard5-core1":{
>> >            "shard":"shard5",
>> >            "state":"active",
>> >            "core":"shard5-core1",
>> >            "collection":"collection1",
>> >            "node_name":"10.38.33.16:7575_solr",
>> >            "base_url":"http://10.38.33.16:7575/solr",
>> >            "leader":"true"},
>> >          "10.38.33.17:7577_solr_shard5-core2":{
>> >            "shard":"shard5",
>> >            "state":"recovering",
>> >            "core":"shard5-core2",
>> >            "collection":"collection1",
>> >            "node_name":"10.38.33.17:7577_solr",
>> >            "base_url":"http://10.38.33.17:7577/solr"}}}
>> >
>> >
>> > On Wed, Apr 3, 2013 at 2:40 PM, Mark Miller <markrmiller@gmail.com>
>> wrote:
>> >
>> >> It should be part of your clusterstate.json. Some users have reported
>> >> trouble upgrading a previous zk install when this change came. I
>> >> recommended manually updating the clusterstate.json to have the right
>> info,
>> >> and that seemed to work. Otherwise, I guess you have to start from a
>> clean
>> >> zk state.
>> >>
>> >> If you don't have that range information, I think there will be
>> trouble.
>> >> Do you have an router type defined in the clusterstate.json?
>> >>
>> >> - Mark
>> >>
>> >> On Apr 3, 2013, at 2:24 PM, Jamie Johnson <jej2003@gmail.com> wrote:
>> >>
>> >>> Where is this information stored in ZK?  I don't see it in the cluster
>> >>> state (or perhaps I don't understand it ;) ).
>> >>>
>> >>> Perhaps something with my process is broken.  What I do when I start
>> from
>> >>> scratch is the following
>> >>>
>> >>> ZkCLI -cmd upconfig ...
>> >>> ZkCLI -cmd linkconfig ....
>> >>>
>> >>> but I don't ever explicitly create the collection.  What should the
>> steps
>> >>> from scratch be?  I am moving from an unreleased snapshot of 4.0 so
I
>> >> never
>> >>> did that previously either so perhaps I did create the collection in
>> one
>> >> of
>> >>> my steps to get this working but have forgotten it along the way.
>> >>>
>> >>>
>> >>> On Wed, Apr 3, 2013 at 2:16 PM, Mark Miller <markrmiller@gmail.com>
>> >> wrote:
>> >>>
>> >>>> Thanks for digging Jamie. In 4.2, hash ranges are assigned up front
>> >> when a
>> >>>> collection is created - each shard gets a range, which is stored
in
>> >>>> zookeeper. You should not be able to end up with the same id on
>> >> different
>> >>>> shards - something very odd going on.
>> >>>>
>> >>>> Hopefully I'll have some time to try and help you reproduce. Ideally
>> we
>> >>>> can capture it in a test case.
>> >>>>
>> >>>> - Mark
>> >>>>
>> >>>> On Apr 3, 2013, at 1:13 PM, Jamie Johnson <jej2003@gmail.com>
wrote:
>> >>>>
>> >>>>> no, my thought was wrong, it appears that even with the parameter
>> set I
>> >>>> am
>> >>>>> seeing this behavior.  I've been able to duplicate it on 4.2.0
by
>> >>>> indexing
>> >>>>> 100,000 documents on 10 threads (10,000 each) when I get to
400,000
>> or
>> >>>> so.
>> >>>>> I will try this on 4.2.1. to see if I see the same behavior
>> >>>>>
>> >>>>>
>> >>>>> On Wed, Apr 3, 2013 at 12:37 PM, Jamie Johnson <jej2003@gmail.com>
>> >>>> wrote:
>> >>>>>
>> >>>>>> Since I don't have that many items in my index I exported
all of
>> the
>> >>>> keys
>> >>>>>> for each shard and wrote a simple java program that checks
for
>> >>>> duplicates.
>> >>>>>> I found some duplicate keys on different shards, a grep
of the
>> files
>> >> for
>> >>>>>> the keys found does indicate that they made it to the wrong
places.
>> >> If
>> >>>> you
>> >>>>>> notice documents with the same ID are on shard 3 and shard
5.  Is
>> it
>> >>>>>> possible that the hash is being calculated taking into account
only
>> >> the
>> >>>>>> "live" nodes?  I know that we don't specify the numShards
param @
>> >>>> startup
>> >>>>>> so could this be what is happening?
>> >>>>>>
>> >>>>>> grep -c "7cd1a717-3d94-4f5d-bcb1-9d8a95ca78de" *
>> >>>>>> shard1-core1:0
>> >>>>>> shard1-core2:0
>> >>>>>> shard2-core1:0
>> >>>>>> shard2-core2:0
>> >>>>>> shard3-core1:1
>> >>>>>> shard3-core2:1
>> >>>>>> shard4-core1:0
>> >>>>>> shard4-core2:0
>> >>>>>> shard5-core1:1
>> >>>>>> shard5-core2:1
>> >>>>>> shard6-core1:0
>> >>>>>> shard6-core2:0
>> >>>>>>
>> >>>>>>
>> >>>>>> On Wed, Apr 3, 2013 at 10:42 AM, Jamie Johnson <jej2003@gmail.com>
>> >>>> wrote:
>> >>>>>>
>> >>>>>>> Something interesting that I'm noticing as well, I just
indexed
>> >> 300,000
>> >>>>>>> items, and some how 300,020 ended up in the index. 
I thought
>> >> perhaps I
>> >>>>>>> messed something up so I started the indexing again
and indexed
>> >> another
>> >>>>>>> 400,000 and I see 400,064 docs.  Is there a good way
to find
>> >> possibile
>> >>>>>>> duplicates?  I had tried to facet on key (our id field)
but that
>> >> didn't
>> >>>>>>> give me anything with more than a count of 1.
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Wed, Apr 3, 2013 at 9:22 AM, Jamie Johnson <jej2003@gmail.com>
>> >>>> wrote:
>> >>>>>>>
>> >>>>>>>> Ok, so clearing the transaction log allowed things
to go again.
>>  I
>> >> am
>> >>>>>>>> going to clear the index and try to replicate the
problem on
>> 4.2.0
>> >>>> and then
>> >>>>>>>> I'll try on 4.2.1
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> On Wed, Apr 3, 2013 at 8:21 AM, Mark Miller <
>> markrmiller@gmail.com
>> >>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> No, not that I know if, which is why I say we
need to get to the
>> >>>> bottom
>> >>>>>>>>> of it.
>> >>>>>>>>>
>> >>>>>>>>> - Mark
>> >>>>>>>>>
>> >>>>>>>>> On Apr 2, 2013, at 10:18 PM, Jamie Johnson <jej2003@gmail.com>
>> >>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Mark
>> >>>>>>>>>> It's there a particular jira issue that
you think may address
>> >> this?
>> >>>> I
>> >>>>>>>>> read
>> >>>>>>>>>> through it quickly but didn't see one that
jumped out
>> >>>>>>>>>> On Apr 2, 2013 10:07 PM, "Jamie Johnson"
<jej2003@gmail.com>
>> >> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>> I brought the bad one down and back
up and it did nothing.  I
>> can
>> >>>>>>>>> clear
>> >>>>>>>>>>> the index and try4.2.1. I will save
off the logs and see if
>> there
>> >>>> is
>> >>>>>>>>>>> anything else odd
>> >>>>>>>>>>> On Apr 2, 2013 9:13 PM, "Mark Miller"
<markrmiller@gmail.com>
>> >>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> It would appear it's a bug given
what you have said.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Any other exceptions would be useful.
Might be best to start
>> >>>>>>>>> tracking in
>> >>>>>>>>>>>> a JIRA issue as well.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> To fix, I'd bring the behind node
down and back again.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Unfortunately, I'm pressed for time,
but we really need to
>> get
>> >> to
>> >>>>>>>>> the
>> >>>>>>>>>>>> bottom of this and fix it, or determine
if it's fixed in
>> 4.2.1
>> >>>>>>>>> (spreading
>> >>>>>>>>>>>> to mirrors now).
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> On Apr 2, 2013, at 7:21 PM, Jamie
Johnson <jej2003@gmail.com
>> >
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> Sorry I didn't ask the obvious
question.  Is there anything
>> >> else
>> >>>>>>>>> that I
>> >>>>>>>>>>>>> should be looking for here and
is this a bug?  I'd be happy
>> to
>> >>>>>>>>> troll
>> >>>>>>>>>>>>> through the logs further if
more information is needed, just
>> >> let
>> >>>> me
>> >>>>>>>>>>>> know.
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> Also what is the most appropriate
mechanism to fix this.
>>  Is it
>> >>>>>>>>>>>> required to
>> >>>>>>>>>>>>> kill the index that is out of
sync and let solr resync
>> things?
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:45
PM, Jamie Johnson <
>> >> jej2003@gmail.com
>> >>>>>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>
>> >>>>>>>>>>>>>> sorry for spamming here....
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> shard5-core2 is the instance
we're having issues with...
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException
>> >> log
>> >>>>>>>>>>>>>> SEVERE: shard update error
StdNode:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException
>> >>>>>>>>>>>> :
>> >>>>>>>>>>>>>> Server at
>> >> http://10.38.33.17:7577/solr/dsc-shard5-core2returned
>> >>>>>>>>> non
>> >>>>>>>>>>>> ok
>> >>>>>>>>>>>>>> status:503, message:Service
Unavailable
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>>>>>>>>>>    at
>> >> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> >>>>>>>>>>>>>>    at
>> >> java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> >>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> >>>>>>>>>>>>>>    at java.lang.Thread.run(Thread.java:662)
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> On Tue, Apr 2, 2013 at 5:43
PM, Jamie Johnson <
>> >>>> jej2003@gmail.com>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> here is another one
that looks interesting
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> Apr 2, 2013 7:27:14
PM
>> org.apache.solr.common.SolrException
>> >> log
>> >>>>>>>>>>>>>>> SEVERE: org.apache.solr.common.SolrException:
ClusterState
>> >> says
>> >>>>>>>>> we are
>> >>>>>>>>>>>>>>> the leader, but locally
we don't think so
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>> org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>> org.apache.solr.core.SolrCore.execute(SolrCore.java:1797)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637)
>> >>>>>>>>>>>>>>>    at
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>> On Tue, Apr 2, 2013
at 5:41 PM, Jamie Johnson <
>> >>>> jej2003@gmail.com
>> >>>>>>>>>>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> Looking at the master
it looks like at some point there
>> were
>> >>>>>>>>> shards
>> >>>>>>>>>>>> that
>> >>>>>>>>>>>>>>>> went down.  I am
seeing things like what is below.
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> NFO: A cluster state
change: WatchedEvent
>> >> state:SyncConnected
>> >>>>>>>>>>>>>>>> type:NodeChildrenChanged
path:/live_nodes, has occurred -
>> >>>>>>>>>>>> updating... (live
>> >>>>>>>>>>>>>>>> nodes size: 12)
>> >>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52
PM
>> >>>>>>>>> org.apache.solr.common.cloud.ZkStateReader$3
>> >>>>>>>>>>>>>>>> process
>> >>>>>>>>>>>>>>>> INFO: Updating live
nodes... (9)
>> >>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52
PM
>> >>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>> runLeaderProcess
>> >>>>>>>>>>>>>>>> INFO: Running the
leader process.
>> >>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52
PM
>> >>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>> shouldIBeLeader
>> >>>>>>>>>>>>>>>> INFO: Checking if
I should try and be the leader.
>> >>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52
PM
>> >>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>> shouldIBeLeader
>> >>>>>>>>>>>>>>>> INFO: My last published
State was Active, it's okay to be
>> >> the
>> >>>>>>>>> leader.
>> >>>>>>>>>>>>>>>> Apr 2, 2013 8:12:52
PM
>> >>>>>>>>>>>> org.apache.solr.cloud.ShardLeaderElectionContext
>> >>>>>>>>>>>>>>>> runLeaderProcess
>> >>>>>>>>>>>>>>>> INFO: I may be the
new leader - try and sync
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> On Tue, Apr 2, 2013
at 5:09 PM, Mark Miller <
>> >>>>>>>>> markrmiller@gmail.com
>> >>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I don't think
the versions you are thinking of apply
>> here.
>> >>>>>>>>> Peersync
>> >>>>>>>>>>>>>>>>> does not look
at that - it looks at version numbers for
>> >>>>>>>>> updates in
>> >>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>> transaction
log - it compares the last 100 of them on
>> >> leader
>> >>>>>>>>> and
>> >>>>>>>>>>>> replica.
>> >>>>>>>>>>>>>>>>> What it's saying
is that the replica seems to have
>> versions
>> >>>>>>>>> that
>> >>>>>>>>>>>> the leader
>> >>>>>>>>>>>>>>>>> does not. Have
you scanned the logs for any interesting
>> >>>>>>>>> exceptions?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Did the leader
change during the heavy indexing? Did
>> any zk
>> >>>>>>>>> session
>> >>>>>>>>>>>>>>>>> timeouts occur?
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> - Mark
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> On Apr 2, 2013,
at 4:52 PM, Jamie Johnson <
>> >> jej2003@gmail.com
>> >>>>>
>> >>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> I am currently
looking at moving our Solr cluster to
>> 4.2
>> >> and
>> >>>>>>>>>>>> noticed a
>> >>>>>>>>>>>>>>>>>> strange
issue while testing today.  Specifically the
>> >> replica
>> >>>>>>>>> has a
>> >>>>>>>>>>>>>>>>> higher
>> >>>>>>>>>>>>>>>>>> version
than the master which is causing the index to
>> not
>> >>>>>>>>>>>> replicate.
>> >>>>>>>>>>>>>>>>>> Because
of this the replica has fewer documents than
>> the
>> >>>>>>>>> master.
>> >>>>>>>>>>>> What
>> >>>>>>>>>>>>>>>>>> could cause
this and how can I resolve it short of
>> taking
>> >>>>>>>>> down the
>> >>>>>>>>>>>>>>>>> index
>> >>>>>>>>>>>>>>>>>> and scping
the right version in?
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> MASTER:
>> >>>>>>>>>>>>>>>>>> Last Modified:about
an hour ago
>> >>>>>>>>>>>>>>>>>> Num Docs:164880
>> >>>>>>>>>>>>>>>>>> Max Doc:164880
>> >>>>>>>>>>>>>>>>>> Deleted
Docs:0
>> >>>>>>>>>>>>>>>>>> Version:2387
>> >>>>>>>>>>>>>>>>>> Segment
Count:23
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> REPLICA:
>> >>>>>>>>>>>>>>>>>> Last Modified:
about an hour ago
>> >>>>>>>>>>>>>>>>>> Num Docs:164773
>> >>>>>>>>>>>>>>>>>> Max Doc:164773
>> >>>>>>>>>>>>>>>>>> Deleted
Docs:0
>> >>>>>>>>>>>>>>>>>> Version:3001
>> >>>>>>>>>>>>>>>>>> Segment
Count:30
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> in the replicas
log it says this:
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> INFO: Creating
new http client,
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>
>> >>
>> config:maxConnectionsPerHost=20&maxConnections=10000&connTimeout=30000&socketTimeout=30000&retry=false
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Apr 2, 2013
8:15:06 PM org.apache.solr.update.PeerSync
>> >> sync
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> INFO: PeerSync:
core=dsc-shard5-core2
>> >>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrSTART
replicas=[
>> >>>>>>>>>>>>>>>>>> http://10.38.33.16:7575/solr/dsc-shard5-core1/]
>> >>>> nUpdates=100
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Apr 2, 2013
8:15:06 PM org.apache.solr.update.PeerSync
>> >>>>>>>>>>>> handleVersions
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> INFO: PeerSync:
core=dsc-shard5-core2 url=
>> >>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
>> >>>>>>>>>>>>>>>>>> Received
100 versions from
>> >>>>>>>>> 10.38.33.16:7575/solr/dsc-shard5-core1/
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Apr 2, 2013
8:15:06 PM org.apache.solr.update.PeerSync
>> >>>>>>>>>>>> handleVersions
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> INFO: PeerSync:
core=dsc-shard5-core2 url=
>> >>>>>>>>>>>>>>>>> http://10.38.33.17:7577/solr
 Our
>> >>>>>>>>>>>>>>>>>> versions
are newer. ourLowThreshold=1431233788792274944
>> >>>>>>>>>>>>>>>>>> otherHigh=1431233789440294912
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> Apr 2, 2013
8:15:06 PM org.apache.solr.update.PeerSync
>> >> sync
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> INFO: PeerSync:
core=dsc-shard5-core2
>> >>>>>>>>>>>>>>>>>> url=http://10.38.33.17:7577/solrDONE.
sync succeeded
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>> which again
seems to point that it thinks it has a
>> newer
>> >>>>>>>>> version of
>> >>>>>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>>>>>> index so
it aborts.  This happened while having 10
>> threads
>> >>>>>>>>> indexing
>> >>>>>>>>>>>>>>>>> 10,000
>> >>>>>>>>>>>>>>>>>> items writing
to a 6 shard (1 replica each) cluster.
>>  Any
>> >>>>>>>>> thoughts
>> >>>>>>>>>>>> on
>> >>>>>>>>>>>>>>>>> this
>> >>>>>>>>>>>>>>>>>> or what
I should look for would be appreciated.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message