lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cao Manh Dat (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SOLR-8292) TransactionLog.next() does not honor contract and return null for EOF
Date Fri, 06 Jan 2017 09:19:58 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803900#comment-15803900
] 

Cao Manh Dat edited comment on SOLR-8292 at 1/6/17 9:19 AM:
------------------------------------------------------------

Hi Erick, I'm not familiar with CDCR code much. But I will give it a try today. Do we have
any test that re procedure this error?

Here are the log error on SOLR-4116, 
{code}
2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - [recoveryExecutor-8-thread-1] - : Starting
log replay tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.0000000000000000028 refcount=2}
active=false starting pos=0
{code}
If your log also have flag {{active=false starting pos=0}} then I think both issues can be
related.


was (Author: caomanhdat):
Hi Erick, I'm not familiar with CDCR code much. But I will give it a try today. Do we have
any test that re procedure this error?

> TransactionLog.next() does not honor contract and return null for EOF
> ---------------------------------------------------------------------
>
>                 Key: SOLR-8292
>                 URL: https://issues.apache.org/jira/browse/SOLR-8292
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>         Attachments: SOLR-8292.patch
>
>
> This came to light in CDCR testing, which stresses this code a lot, there's a stack trace
showing this line (641 trunk) throwing an EOF exception:
> o = codec.readVal(fis);
> At first I thought to just wrap reading fis in a try/catch and return null, but looking
at the code a bit more I'm not so sure, that seems like it'd mask what looks at first glance
like a bug in the logic.
> A few lines earlier (633-4) there's these lines:
> // shouldn't currently happen - header and first record are currently written at the
same time
> if (fis.position() >= fos.size()) {
> Why are we comparing the the input file position against the size of the output file?
Maybe because the 'i' key is right next to the 'o' key? The comment hints that it's checking
for the ability to read the first record in input stream along with the header. And perhaps
there's a different issue here because the expectation clearly is that the first record should
be there if the header is.
> So what's the right thing to do? Wrap in a try/catch and return null for EOF? Change
the test? Do both?
> I can take care of either, but wanted a clue whether the comparison of fis to fos is
intended.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message