uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marshall Schor (JIRA)" <...@uima.apache.org>
Subject [jira] Commented: (UIMA-1861) SnowballAnnotator needs refactoring
Date Thu, 02 Sep 2010 19:21:54 GMT

    [ https://issues.apache.org/jira/browse/UIMA-1861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12905646#action_12905646

Marshall Schor commented on UIMA-1861:

Thanks for the fixes/patch.  Here are a few suggested changes, to take advantage of JCas better.
(I've attached this version as patch2 above)

1) Although this annotator is set up as a JCas annotator, it is missing the JCas type for
TokenAnnotation.  Because of this, it goes to some lengths to not make use of this type where
it could be useful.  To add the JCas cover types for this is easy: open the desc/SnowballAnnotator.xml
descriptor in the Component Descriptor editor in Eclipse, click the typesystem page, and push
the JCasGen button.  This will generate the missing classes for the types and add them to
the project.

If the TokenAnnotation JCas type was available, the lines:

      // iterate over all token annotations and add stem if available
      FSIterator tokenIterator = aJCas.getCas().getAnnotationIndex(this.tokenAnnotation).iterator();

(with patch)

    // iterate over all token annotations and add stem if available
      FSIterator tokenIterator = aJCas.getAnnotationIndex(this.tokenAnnotation).iterator();
    // note: causes a warning leading to a suppress warnings, related to generics

could be written
    // iterate over all token annotations and add stem if available
      FSIterator<TokenAnnotation> tokenIterator = (FSIterator<TokenAnnotation>)(FSIterator<?>)
 // very ugly "double-fisted cast"
and the code in the bottom method (typeSystemInit) would not be needed. The "double-fisted
cast" is described here http://markmail.org/message/w5kpympalj6tvqq3.

Alternatively, to avoid the double cast, the FSIterator could be over the type Annotation,
and an explicit cast of the next() could be done to TokenAnnotation:
    // iterate over all token annotations and add stem if available
      FSIterator<Annotation> tokenIterator = aJCas.getAnnotationIndex(TokenAnnotation.type).iterator();
      TokenAnnotation annot = (TokenAnnotation) tokenIterator.next();

The line further on down which reads

        // get stemmer result and set annotation feature
        annot.setStringValue(this.tokenAnnotationStemmFeature, stemmer.getCurrent());

would be better written (using JCas style) as:

        // get stemmer result and set annotation feature

If the JCas style is used, the typeSystemInit method can be deleted, along with all the constants
added to support it, because the things its computing are not used.  In any case, it should
not be called in the process method.  (The UIMA framework calls it directly, but only when
the type system changes).

> SnowballAnnotator needs refactoring
> -----------------------------------
>                 Key: UIMA-1861
>                 URL: https://issues.apache.org/jira/browse/UIMA-1861
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox-SnowballAnnotator
>    Affects Versions: 2.3.1
>            Reporter: Tommaso Teofili
>            Assignee: Tommaso Teofili
>             Fix For: 2.3.1
>         Attachments: SnowballAnnotatorPatch2.txt, UIMA1861-patch.txt
> SnowballAnnotator is extending the deprecated JTextAnnotator_ImplBase, have some unused
imports and generics should be enabled.
> Moreover the initialize() method fails due to the AnnotatorContext object being null
when run in a 2.3.1-SNAPSHOT distribution.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message