uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabien POULARD (JIRA)" <...@uima.apache.org>
Subject [jira] Updated: (UIMA-1833) Create an AE for training the HMM Tagger models
Date Thu, 08 Jul 2010 17:19:50 GMT

     [ https://issues.apache.org/jira/browse/UIMA-1833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Fabien POULARD updated UIMA-1833:

    Attachment: model-trainer-ae.patch

A proposition of such an analysis engine.

It has been developed in order to limit the amount of modifications in the HMM Tagger code
(only need to change the ModelGeneration init method from private to public). 
It takes as parameters :
 - where the model file will be written
 - a feature path to the POS values to learn

This analysis has been tested on the French Treebank (http://www.llf.cnrs.fr/Gens/Abeille/French-Treebank-fr.php)
corpus and the generated model has been successfully used by the HMM Tagger component. 

The model generated for French may be contributed later, we wait for the authorization from
the corpus creators.

> Create an AE for training the HMM Tagger models
> -----------------------------------------------
>                 Key: UIMA-1833
>                 URL: https://issues.apache.org/jira/browse/UIMA-1833
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-Tagger
>         Environment: OS:
> Debian Linux Squeeze 64bits
> JVM:
> java version "1.6.0_20"
> Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
> Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)
>            Reporter: Fabien POULARD
>            Assignee: Fabien POULARD
>            Priority: Minor
>             Fix For: 2.3CE
>         Attachments: model-trainer-ae.patch
> There is a class to train a model for the HMM Tagger out of a corpus. However, this is
a standalone application that does not take advantage of the UIMA capabilities. It would be
better to train such a model thanks to an analysis engine.
> A training CPE would be like :
>  1- a collection reader loading the gold standard corpus
>  2- the HMM Tagger model trainer analysis engine that would browse some specific annotation,
extract the material to feed the learning algorithm and finally export a model file.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message