lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rafał Kuć (Commented) (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-3001) Documents droping when using DistributedUpdateProcessor
Date Tue, 03 Jan 2012 17:14:39 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178827#comment-13178827
] 

Rafał Kuć commented on SOLR-3001:
---------------------------------

Thanks for the information Mark. It may be the case, as I'm using solrcloud which is about
2 - 3 weeks old. I'll verify that as soon as I can. 
                
> Documents droping when using DistributedUpdateProcessor
> -------------------------------------------------------
>
>                 Key: SOLR-3001
>                 URL: https://issues.apache.org/jira/browse/SOLR-3001
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.0
>         Environment: Windows 7, Ubuntu
>            Reporter: Rafał Kuć
>
> I have a problem with distributed indexing in solrcloud branch. I've setup a cluster
with three Solr servers. I'm using DistributedUpdateProcessor to do the distributed indexing.
What I've noticed is when indexing with StreamingUpdateSolrServer or CommonsHttpSolrServer
and having a queue or a list which have more than one document the documents seems to be dropped.
I did some tests which tried to index 450k documents. If I was sending the documents one by
one, the indexing was properly executed and the three Solr instances was holding 450k documents
(when summed up). However if when I tried to add documents in batches (for example with StreamingUpdateSolrServer
and a queue of 1000) the shard I was sending the documents to had a minimum number of documents
(about 100) while the other shards had about 150k documents. 
> Each Solr was started with a single core and in Zookeeper mode. An example solr.xml file:
> {noformat} 
> <?xml version="1.0" encoding="UTF-8" ?>
> <solr persistent="true">
>  <cores defaultCoreName="collection1" adminPath="/admin/cores" zkClientTimeout="10000"
hostPort="8983" hostContext="solr">
>   <core shard="shard1" instanceDir="." name="collection1" />
>  </cores>
> </solr>
> {noformat} 
> The solrconfig.xml file on each of the shard consisted of the following entries:
> {noformat} 
> <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">
>  <lst name="defaults">
>   <str name="update.chain">distrib</str>
>  </lst>
> </requestHandler>
> {noformat} 
> {noformat} 
> <updateRequestProcessorChain name="distrib">
>  <processor class="org.apache.solr.update.processor.DistributedUpdateProcessorFactory"
/>
>  <processor class="solr.LogUpdateProcessorFactory" />
>  <processor class="solr.RunUpdateProcessorFactory"/>
> </updateRequestProcessorChain>
> {noformat} 
> I found a solution, but I don't know if it is a proper one. I've modified the code that
is responsible for handling the replicas in:
> {{private List<String> setupRequest(int hash)}} of {{DistributedUpdateProcessorFactory}}
> I've added the following code:
> {noformat} 
> if (urls == null) {
>  urls = new ArrayList<String>(1);
>  urls.add(leaderUrl);  
> } else {
>  if (!urls.contains(leaderUrl)) {
>   urls.add(leaderUrl);  
>  }
> }
> {noformat} 
> after:
> {noformat} 
> urls = getReplicaUrls(req, collection, shardId, nodeName);
> {noformat} 
> If this is the proper approach I'll be glad to provide a patch with the modification.

> --
> Regards
> Rafał Kuć
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message