uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2670) FileSystemCollectionReader doesn't set lastSegment correctly
Date Sun, 16 Jun 2013 00:44:20 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13684534#comment-13684534
] 

Marshall Schor commented on UIMA-2670:
--------------------------------------

I'm imagining the use case is one where CASes are sent into a pipeline, and some annotators
want to know when the last CAS is coming through, to trigger some special computation.  

The inaccurate / approximate measures of downloads of UIMA were around 10,000 / month (up
to a year+ ago - the stats have stopped working), so there's a lot of potential users out
there...

I think it's fine to add a parameter to this which, when set, changes the behavior to always
set the "final segment", if that will work for you and your users.
                
> FileSystemCollectionReader doesn't set lastSegment correctly
> ------------------------------------------------------------
>
>                 Key: UIMA-2670
>                 URL: https://issues.apache.org/jira/browse/UIMA-2670
>             Project: UIMA
>          Issue Type: Bug
>          Components: Examples
>    Affects Versions: 2.4.0SDK
>            Reporter: Jens Grivolla
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> FileSystemCollectionReader only sets lastSegment=true (in the SourceDocumentInformation)
on the last document. Given that it loads individual documents, not segments of a document,
this should be "true" for each CAS that it generates.
> This is a problem when later using a CAS multiplier to segment the CAS. It should be
possible to check whether the CAS is a complete document or a segment by testing for "offsetInSource==0
&& lastSegment==true".
> in org.apache.uima.examples.cpe.FileSystemCollectionReader:166
> srcDocInfo.setLastSegment(mCurrentIndex == mFiles.size());
> should be:
> srcDocInfo.setLastSegment(true);

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message