lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <>
Subject Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Date Fri, 17 Oct 2014 01:02:12 GMT
On 10/16/2014 6:27 PM, S.L wrote:
> 1. Java Version :java version "1.7.0_51"
> Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
> Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

I believe that build 51 is one of those that is known to have bugs 
related to Lucene.  If you can upgrade this to 67, that would be good, 
but I don't know that it's a pressing matter.  It looks like the Oracle 
JVM, which is good.

> 2.OS
> CentOS Linux release 7.0.1406 (Core)
> 3. Everything is 64 bit , OS , Java , and CPU.
> 4. Java Args.
>      -Dcatalina.home=/opt/tomcat1
>      -Dcatalina.base=/opt/tomcat1
>      -Djava.endorsed.dirs=/opt/tomcat1/endorsed
>      -DzkClientTimeout=20000
>      -DhostContext=solr
>      -Dport=8081
>      -Dsolr.solr.home=/opt/solr/home1
>      -Dfile.encoding=UTF8
>      -Duser.timezone=UTC
>      -XX:+UseG1GC
>      -XX:MaxPermSize=128m
>      -XX:PermSize=64m
>      -Xmx2048m
>      -Xms128m
>      -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager
>      -Djava.util.logging.config.file=/opt/tomcat1/conf/

I would not use the G1 collector myself, but with the heap at only 2GB, 
I don't know that it matters all that much.  Even a worst-case 
collection probably is not going to take more than a few seconds, and 
you've already increased the zookeeper client timeout.

> 5. Zookeeper ensemble has 3 zookeeper instances , which are external and
> are not embedded.
> 6. Container : I am using Tomcat Apache Tomcat Version 7.0.42
> *Additional Observations:*
> I queries all docs on both replicas with distrib=false&fl=id&sort=id+asc,
> then compared the two lists, I could see by eyeballing the first few lines
> of ids in both the lists ,I could say that even though each list has equal
> number of documents i.e 96309 each , but the document ids in them seem to
> be *mutually exclusive* ,  , I did not find even a single  common id in
> those lists , I tried at least 15 manually ,it looks like to me that the
> replicas are disjoint sets.

Are you sure you hit both replicas of the same shard number?  If you 
are, then it sounds like something is going wrong with your document 
routing, or maybe your clusterstate is really messed up.  Recreating the 
collection from scratch and doing a full reindex might be a good plan 
... assuming this is possible for you.  You could create a whole new 
collection, and then when you're ready to switch, delete the original 
collection and create an alias so your app can still use the old name.

How much total RAM do you have on these systems, and how large are those 
index shards?  With a shard having 96K documents, it sounds like your 
whole index is probably just shy of 300K documents.


View raw message