lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Solr Replication Test Case Failure
Date Sat, 31 Jul 2010 15:52:13 GMT
OK, can you try to reproduce now?
Since the comments indicated that all the commits were to bump up the
index version number, I kept them all and just inserted an additional
commit in the query retry loop.

But actually... there may still be a bug somewhere (even if this fixes
the test failures).
Each commit should wait for a new searcher to be registered before
returning... hence it should be impossible for overlapping warming
searchers to be responsible for the failure.  Hence when the test
fails, either the doc add, or the commit is failing.

-Yonik
http://www.lucidimagination.com



On Sat, Jul 31, 2010 at 11:35 AM, Yonik Seeley
<yonik@lucidimagination.com> wrote:
> Do the logs give any hints?
> Downside of only logging SEVERE is that it's much harder to
> investigate the cause of any intermittent failures that do happen.
>
> Looking at this test code, you shouldn't have to wait at all.  The
> test disables replication, indexes docs to the slave, commits (and
> waits for a new searcher to be registered), and then queries the
> slave.
>
> We should just remove that wait loop.
>
> Oh... i just figured it out while writing this I think...
>
>    index(slaveClient, "id", 551, "name", "name = " + 551);
>    slaveClient.commit(true, true);
>    index(slaveClient, "id", 552, "name", "name = " + 552);
>    slaveClient.commit(true, true);
>    index(slaveClient, "id", 553, "name", "name = " + 553);
>    slaveClient.commit(true, true);
>    index(slaveClient, "id", 554, "name", "name = " + 554);
>    slaveClient.commit(true, true);
>    index(slaveClient, "id", 555, "name", "name = " + 555);
>    slaveClient.commit(true, true);
>
> I bet that last commit can fail due to max warming searchers.
> I'll fix.
>
> -Yonik
> http://www.lucidimagination.com
>
> On Sat, Jul 31, 2010 at 8:41 AM, Mark Miller <markrmiller@gmail.com> wrote:
>>
>>
>>  This looks like it might actually be an issue - it fails once every 20
>> runs or so as a guess.
>>
>>   [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
>>    [junit] Testcase:
>> testReplicateAfterWrite2Slave(org.apache.solr.handler.TestReplicationHandler):
>> FAILED
>>    [junit] expected:<1> but was:<0>
>>    [junit] junit.framework.AssertionFailedError: expected:<1> but was:<0>
>>    [junit]     at
>> org.apache.solr.handler.TestReplicationHandler.testReplicateAfterWrite2Slave(TestReplicationHandler.java:464)
>>    [junit]
>>    [junit]
>>    [junit] Tests run: 7, Failures: 1, Errors: 0, Time elapsed: 343.909 sec
>>
>> At first I tried to extend the wait for it, but that's obviously no help
>> - in this case the test failed after running for 343 seconds. I've seen it as high
as 968 seconds.
>>
>> - Mark
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message