lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafał Kuć (Created) (JIRA) <>
Subject [jira] [Created] (SOLR-3001) Documents droping when using DistributedUpdateProcessor
Date Tue, 03 Jan 2012 16:04:39 GMT
Documents droping when using DistributedUpdateProcessor

                 Key: SOLR-3001
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
    Affects Versions: 4.0
         Environment: Windows 7, Ubuntu
            Reporter: Rafał Kuć

I have a problem with distributed indexing in solrcloud branch. I've setup a cluster with
three Solr servers. I'm using DistributedUpdateProcessor to do the distributed indexing. What
I've noticed is when indexing with StreamingUpdateSolrServer or CommonsHttpSolrServer and
having a queue or a list which have more than one document the documents seems to be dropped.
I did some tests which tried to index 450k documents. If I was sending the documents one by
one, the indexing was properly executed and the three Solr instances was holding 450k documents
(when summed up). However if when I tried to add documents in batches (for example with StreamingUpdateSolrServer
and a queue of 1000) the shard I was sending the documents to had a minimum number of documents
(about 100) while the other shards had about 150k documents. 

Each Solr was started with a single core and in Zookeeper mode. An example solr.xml file:
<?xml version="1.0" encoding="UTF-8" ?>
<solr persistent="true">
 <cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="10000"
hostPort="8983" hostContext="solr">
  <core shard="shard1" instanceDir="." name="collection1" />

The solrconfig.xml file on each of the shard consisted of the following entries:
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
 <lst name="defaults">
  <str name="update.chain">distrib</str>

<updateRequestProcessorChain name="distrib">
 <processor class="org.apache.solr.update.processor.DistributedUpdateProcessorFactory"
 <processor class="solr.LogUpdateProcessorFactory" />
 <processor class="solr.RunUpdateProcessorFactory"/>

I found a solution, but I don't know if it is a proper one. I've modified the code that is
responsible for handling the replicas in:
{{private List<String> setupRequest(int hash)}} of {{DistributedUpdateProcessorFactory}}

I've added the following code:
if (urls == null) {
 urls = new ArrayList<String>(1);
} else {
 if (!urls.contains(leaderUrl)) {

urls = getReplicaUrls(req, collection, shardId, nodeName);

If this is the proper approach I'll be glad to provide a patch with the modification. 

Rafał Kuć
Sematext :: :: Solr - Lucene - Nutch
Lucene ecosystem search ::

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message