lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Webster Homer <webster.ho...@sial.com>
Subject Solr 7.2.0 CDCR Issue with TLOG collections
Date Fri, 02 Mar 2018 19:39:20 GMT
We have been having strange behavior with CDCR on Solr 7.2.0.

We have a number of replicas which have identical schemas. We found that
TLOG replicas give much more consistent search results.

We created a collection using TLOG replicas in our QA clouds.
We have a locally hosted solrcloud with 2 nodes, all our collections have 2
shards. We use CDCR to replicate the collections from this environment to 2
data centers hosted in Google cloud. This seems to work fairly well for our
collections with NRT replicas. However the new TLOG collection has problems.

The google cloud solrclusters have 4 nodes each (3 separate Zookeepers). 2
shards per collection with 2 replicas per shard.

We never see data show up in the cloud collections, but we do see tlog
files show up on the cloud servers. I can see that all of the servers have
cdcr started, buffers are disabled.
The cdcr source configuration is:

"requestHandler":{"/cdcr":{
      "name":"/cdcr",
      "class":"solr.CdcrRequestHandler",
      "replica":[
        {
          "zkHost":"xxx-mzk01.sial.com:2181,xxx-mzk02.sial.com:2181,
xxx-mzk03.sial.com:2181/solr",
          "source":"b2b-catalog-material-180124T",
          "target":"b2b-catalog-material-180124T"},
        {
          "zkHost":"yyyy-mzk01.sial.com:2181,yyyy-mzk02.sial.com:2181,
yyyy-mzk03.sial.com:2181/solr",
          "source":"b2b-catalog-material-180124T",
          "target":"b2b-catalog-material-180124T"}],
      "replicator":{
        "threadPoolSize":4,
        "schedule":500,
        "batchSize":250},
      "updateLogSynchronizer":{"schedule":60000}}}}

The target configurations in the 2 clouds are the same:
"requestHandler":{"/cdcr":{ "name":"/cdcr", "class":
"solr.CdcrRequestHandler", "buffer":{"defaultState":"disabled"}}}

All of our collections have a timestamp field, index_date. In the source
collection all the records have a date of 2/28/2018 but the target
collections have a latest date of 1/26/2018

I don't see cdcr errors in the logs, but we use logstash to search them,
and we're still perfecting that.

We have a number of similar collections that behave correctly. This is the
only collection that is a TLOG collection. It appears that CDCR doesn't
support TLOG collections.

This begins to look like a bug

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message