uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nicolas Hernandez (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2110) Turn the HMMTagger class into a more generic class for tagging tasks
Date Thu, 26 May 2011 12:48:47 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13039687#comment-13039687

Nicolas Hernandez commented on UIMA-2110:

More information about the process to build the models can be found here http://enicolashernandez.blogspot.com/2011/05/construire-des-modelisations-du-french.html
(accidentally it is in French)

By the way, what about the current submission? Would it have been better to dissociate the
submission of the various generic parameters ? For example, on the one hand, the ones which
handle the view, the sentence type and the feature path of the annotation to create by tagging,
and on the other hand the process to manage models by parameter.

Let me know

> Turn the HMMTagger class into a more generic class for tagging tasks  
> ----------------------------------------------------------------------
>                 Key: UIMA-2110
>                 URL: https://issues.apache.org/jira/browse/UIMA-2110
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-Tagger
>    Affects Versions: 2.3
>         Environment: OS
> Linux version 2.6.32-30-generic (buildd@vernadsky) (gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5)
) #59-Ubuntu SMP Tue Mar 1 21:30:21 UTC 2011
> java version "1.6.0_17"
> Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
> Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
>            Reporter: Nicolas Hernandez
>            Priority: Minor
>         Attachments: AMoreGenericHMMTaggerDesc.patch, AMoreGenericHMMTaggerSrcClass.patch
>   Original Estimate: 1.5h
>  Remaining Estimate: 1.5h
> Despite its name, the code of the org.apache.uima.examples.tagger.HMMTagger 
> class is not totally independant from the pos tagging task. 
> In addition it assumes that the feature path to update with the result of the 
> tagging is org.apache.uima.TokenAnnotation:posTag.
> We propose to let the possibility to users to specify by parameter the feature 
> path to set. This parameter is optional. If it is left free, the tagger will 
> work as usually using the org.apache.uima.TokenAnnotation:posTag as default value.
> By the way, we propose to add three optional parameters : InputView, SentenceType and
> Since the HMM Learner has got the possibility to specify the view to use to 
> train a model, we consequently decide to give the same possibility for the 
> tagger. By default, it works on the _InitialView. It is actually quite useful in practice!
> The org.apache.uima.TokenAnnotation type is not the only annotation type which is assumed

> to be present in the CAS. Actually, the HMMTagger processes tokens sentence by sentence.
It uses the   
> org.apache.uima.SentenceAnnotation to select the tokens. The SentenceType parameter aims
> letting the users free to specify their own sentence annotation Type. The default value
> org.apache.uima.SentenceAnnotation. 
> The ModelFile parameter is a concurrent way to the resource declaration way to specify
a model.
> Left empty, it won t be considered. Otherwise it will predomine over the resource declaration.

> When specified, the multiple deployement of the tagger cannot be allowed but in practice
for the user it may be easier to configure a parameter through Eclipse.    
> Two distincts patches will be provided, one for the class and the other for the descriptor.
> Future improvement of the class might offer the possibility to create new annotations
not only to update existing ones.  
> Future improvement of the descriptor may dissociate what it is up to the tagger and what
it is relevant for the pos tagger...

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message