lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joel Bernstein (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-8709) Add checksum to the TopicStream to ensure delivery of all documents within a Topic
Date Tue, 05 Apr 2016 16:31:25 GMT

     [ https://issues.apache.org/jira/browse/SOLR-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-8709:
---------------------------------
    Description: 
Currently the TopicStream can miss documents if version numbers are received out-of-order.
The TopicStream sorts on version number so it will only miss out-of-order versions that span
commit boundaries. Stress testing was not able create a missed document scenario, but code
review points to the possibility of this happening.

In order to resolve this issue we can adopt an approach that keeps a checksum for of the version
numbers for a specific time window. This checksum can be checked each run and if the checksums
don't match the documents from the time window can be resent. As long as the time window is
longer then the softCommit interval, this will guarantee delivery of all documents for the
Topic. This won't guarantee *one time delivery* but should be provide a reasonable expectation
of one time delivery.

  was:
Currently the TopicStream can miss documents if version numbers are received out-of-order.
The TopicStream sorts on version number so it will only miss out-of-order versions that span
commit boundaries.

In order to resolve this issue we can adopt an approach that keeps a set of the last N version
numbers sent for each Topic.  As the documents are scanned we can check for documents within
this time window that do not appear in the sent set. These documents can then be sent.


> Add checksum to the TopicStream to ensure delivery of all documents within a Topic
> ----------------------------------------------------------------------------------
>
>                 Key: SOLR-8709
>                 URL: https://issues.apache.org/jira/browse/SOLR-8709
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Joel Bernstein
>
> Currently the TopicStream can miss documents if version numbers are received out-of-order.
The TopicStream sorts on version number so it will only miss out-of-order versions that span
commit boundaries. Stress testing was not able create a missed document scenario, but code
review points to the possibility of this happening.
> In order to resolve this issue we can adopt an approach that keeps a checksum for of
the version numbers for a specific time window. This checksum can be checked each run and
if the checksums don't match the documents from the time window can be resent. As long as
the time window is longer then the softCommit interval, this will guarantee delivery of all
documents for the Topic. This won't guarantee *one time delivery* but should be provide a
reasonable expectation of one time delivery.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message