uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2670) FileSystemCollectionReader doesn't set lastSegment correctly
Date Thu, 13 Jun 2013 21:37:21 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682780#comment-13682780

Marshall Schor commented on UIMA-2670:

This is "example" code (which people often just use, it's true).  This flag (lastSegment)
is being used in this sample code to indicate the end of the collection.  

If this were changed, it would potentially break other user's uses of this which are relying
on this bit of data.

A workaround is, of course, to have your own version of this example code where you change
this, etc.

I suppose we could make this a configurable, via a parameter.  If you would like to contribute
such a fix, we can put it in.  It should, however, for backwards compatibility, work as it
does now if the parameter wasn't specified.
> FileSystemCollectionReader doesn't set lastSegment correctly
> ------------------------------------------------------------
>                 Key: UIMA-2670
>                 URL: https://issues.apache.org/jira/browse/UIMA-2670
>             Project: UIMA
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.4.0SDK
>            Reporter: Jens Grivolla
>   Original Estimate: 10m
>  Remaining Estimate: 10m
> FileSystemCollectionReader only sets lastSegment=true (in the SourceDocumentInformation)
on the last document. Given that it loads individual documents, not segments of a document,
this should be "true" for each CAS that it generates.
> This is a problem when later using a CAS multiplier to segment the CAS. It should be
possible to check whether the CAS is a complete document or a segment by testing for "offsetInSource==0
&& lastSegment==true".
> in org.apache.uima.examples.cpe.FileSystemCollectionReader:166
> srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size());
> should be:
> srcDocInfo.setLastSegment(true);

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message