uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Resolved] (UIMA-2397) TextMarker: Improve overall functionality in use cases with very large artifacts
Date Thu, 21 Mar 2013 13:47:15 GMT

     [ https://issues.apache.org/jira/browse/UIMA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Peter Klügl resolved UIMA-2397.
-------------------------------

    Resolution: Fixed

done
                
> TextMarker: Improve overall functionality in use cases with very large artifacts
> --------------------------------------------------------------------------------
>
>                 Key: UIMA-2397
>                 URL: https://issues.apache.org/jira/browse/UIMA-2397
>             Project: UIMA
>          Issue Type: Improvement
>          Components: TextMarker
>    Affects Versions: 2.0.0TextMarker
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>             Fix For: 2.0.1TextMarker
>
>
> TextMarker is not applicable in use cases with very large artifacts, e.g., documents
with 500k - 1M tokens.
> Adapt or exchange the rule language to allow the user to handle such texts:
> - reduce the memory profile of TextMarkerBasic inference annotations, make it configurable
respectively.
> - add the concept of simple rules that match only on a single regular expression for
adding annotations without inference annotations (related to UIMA-2331).
> - allow the user to skip seeding at the startup of the engine and to apply the seeders
on certain annotations within rule inference.
> - introduce language concepts that enable the user to split documents into multiple CASs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message