lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shawn Heisey <apa...@elyograg.org>
Subject Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Date Thu, 16 Oct 2014 04:05:24 GMT
On 10/15/2014 9:26 PM, S.L wrote:
> Look at the logging information I provided below , looks like the results
> are only being returned back for this solrCloud cluster  if the request
> goes to one of the two replicas of a shard.
>
> I have verified that numDocs in the replicas for a given shard is same but
> there is difference in the maxDoc and deletedDocs, does this signal the
> replicas being out of sync ?
>
> Even if the numDocs are same , how do we guarantee that those docs are
> identical and have the same uniquekeys , is there a way to verify this ? I
> am suspecting that  as the numDocs is same across the replicas , and still
> only when the request goes to one of  the  replicas of the shard that I get
> a result back , the documents with in those replicas with in a shard are
> not an exact replica set of each other.
>
> I suspect the issue I am facing in 4.10.1 cloud is related to
> https://issues.apache.org/jira/browse/SOLR-4924  .
>
> Can anyone please let me know , how to solve this issue of intermittent no
> results for a query ?

query with no results hits these cores:
server 2 shard 3 replica1
server 3 shard 1 replica 1
server 1 shard 2 replica 1

query with 1 result hits these cores:
server 2 shard 1 replica 2
server 3 shard 2 replica 2 (found 1)
server 1 shard 3 replica 2

Here's some URLs for some testing.  They are directed at specific shard 
replicas and are specifically NOT distributed queries:

http://server1.mydomain.com:8081/solr/dyCollection1_shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

http://server3.mydomain.com:8081/solr/dyCollection1_shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-8ff7dffb8bfb&distrib=false

If you run these queries (replacing server names and the /select request 
handler as appropriate), do you get 0 results on the first one and 1 
result on the second one?  If you do, then you've definitely got 
replicas out of sync.  If you get 1 result on both queries, then 
something else is breaking.  If by chance you have taken steps to fix 
this particular ID, pick another one that you know has a problem.

There is no automated way to detect replicas out of sync.  You could 
request all docs on both replicas with distrib=false&fl=id&sort=id+asc, 
then compare the two lists.  Depending on how many docs you have, those 
queries could take a while to run.

If the replicas are out of sync, are there any ERROR entries in the Solr 
log, especially at the time that the problem docs were indexed?

Thanks,
Shawn


Mime
View raw message