lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lindsay Martin <lmar...@abebooks.com>
Subject Re: Unexplained leader initiated recovery after updates - SolrCmdDistributor no longer retries on RemoteSolrException
Date Tue, 13 Jan 2015 19:29:01 GMT
We are experiencing unexpected recovery events when a leader is sending
updates to a replica. A "java.net.SocketException: Connection reset² is
encountered when updating the replica which triggers the recovery.

In our previous Solr 4.6.1 installation, update errors triggered retry
logic in the SolrCmdDistributor and the updates continued without
triggering a leader initialized recovery.


In our current 4.10.2 installation, this retry logic no longer occurs.


It looks like the fix for https://issues.apache.org/jira/browse/SOLR-5509
removed this retry logic. See
https://svn.apache.org/viewvc/lucene/dev/trunk/solr/core/src/java/org/apach
e/solr/update/SolrCmdDistributor.java?r1=1546672&r2=1546164&pathrev=1546672
 . This change was introduced with Solr 4.7.

The commit to remove the retry logic appears to have been removed when
investigating an unstable test. I am wondering if the retry logic should
be restored for production use.

Should I open a ticket to restore the retry logic?

Thanks,

Lindsay 

On 2015-01-12, 5:36 PM, "Lindsay Martin" <lmartin@abebooks.com> wrote:

>I have uncovered some additional details in the shard leader log:
>
>2015-01-11 09:38:00.693 [qtp268575911-3617101] INFO
>org.apache.solr.update.processor.LogUpdateProcessor  ­ [listings]
>webapp=/solr path=/update
>params{distrib.from=http://solr05.search.abebooks.com:8983/solr/listings/&
>u
>pdate.distrib=TOLEADER&wt=javabin&version=2} {add=[14065572860
>(1490024273004199936)]} 0 707
>2015-01-11 09:38:00.913 [updateExecutor-1-thread-35734] ERROR
>org.apache.solr.update.StreamingSolrServers  ­ error
>java.net.SocketException: Connection reset
>
>snip


Mime
View raw message