uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Eckart de Castilho (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-5601) uv3: CasCopier problems with custom subclasses of DocumentAnnotation
Date Tue, 03 Oct 2017 20:29:00 GMT

    [ https://issues.apache.org/jira/browse/UIMA-5601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16190293#comment-16190293

Richard Eckart de Castilho commented on UIMA-5601:

Peter is constantly telling me that DKPro Core should stop using a custom subclass of DocumentAnnotation.
At some point, it may happen. How soon might well depend on what kinds of conclusions we reach
here. However, right now that fact that it is done helps detecting some changes going from
v2 to v3. What I am trying to say is: an elaborate engineering solution might go a bit too

I believe the "normal" way to use a custom document annotation is to replace the JAR file
that contains the default UIMA DocumentAnnotation with an alternative one. However, that is
IMHO quite uncomfortable.

Indeed, having multiple subtypes of DocumentAnnotation (or even multiple instances of it)
in a CAS/view is likely an error. In the long term, it might not be a bad idea to make DocumentAnnotation
final such that no subclasses are allowed - that would make it much easier to handle but it
would likely break stuff for existing users. 

That said, here is what happens in DKPro Core:

* The DocumentMetaData JCas class has a static "create(JCas)" method which checks if there
is already a default UIMA DocumentAnnotation and if so, it copies the information contained
in it, deletes it, and then adds a new DocumentMetaData, setting the previously copied infromation
* all reader components have some logic to create a DKPro Core DocumentMetaData annotation
* when DKPro Core uses the CasCopier, then the target CAS is initialized with a DKPro Core
DocumentMetaData before annotations are copied over

> uv3: CasCopier problems with custom subclasses of DocumentAnnotation
> --------------------------------------------------------------------
>                 Key: UIMA-5601
>                 URL: https://issues.apache.org/jira/browse/UIMA-5601
>             Project: UIMA
>          Issue Type: Bug
>          Components: Core Java Framework
>    Affects Versions: 3.0.0SDK-beta
>            Reporter: Richard Eckart de Castilho
> It seems as if there may be a bug in the way that CasCopier handles the documen annotation.

> Specifically, it seems as if the CasCopier incorrectly handles the case where the target
CAS already contains a document annotation. In my case, I do:
> * create the target CAS
> * add a document annotation (DocumentMetaData extends DocumentAnnotation) to the target
> * create the CasCopier with the source and target CAS
> * copy several FSes but *not* the document annotation
> Expected:
> * target CAS contains 1 DocumentMetaData annotation
> Actual
> * target CAS contains 2 DocumentMetaData annotation
> Also, it seems that `isDocumentAnnotation` may not able to handle it if a CAS uses a
custom subclass of DocumentAnnotation:
> {noformat}
>   private <T extends FeatureStructure> boolean isDocumentAnnotation(T aFS) {
>     if (((TOP)aFS)._getTypeCode() != TypeSystemConstants.docTypeCode) {
>       return false;
>     }
>     if (srcCasDocumentAnnotation == null) {
>       srcCasDocumentAnnotation = srcCasViewImpl.getDocumentAnnotationNoCreate(); 
>     }
>     return aFS == srcCasDocumentAnnotation;
>   }
> {noformat}

This message was sent by Atlassian JIRA

View raw message