uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-2397) TextMarker: Improve overall functionality in use cases with very large artifacts
Date Wed, 20 Mar 2013 09:11:16 GMT

    [ https://issues.apache.org/jira/browse/UIMA-2397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607429#comment-13607429
] 

Peter Klügl commented on UIMA-2397:
-----------------------------------

about "introduce language concepts that enable the user to split documents into multiple CASs.":
This will not be included. User should solve this with common UIMA functionality and include,
e.g., an AE in their script.
                
> TextMarker: Improve overall functionality in use cases with very large artifacts
> --------------------------------------------------------------------------------
>
>                 Key: UIMA-2397
>                 URL: https://issues.apache.org/jira/browse/UIMA-2397
>             Project: UIMA
>          Issue Type: Improvement
>          Components: TextMarker
>    Affects Versions: 2.0.0TextMarker
>            Reporter: Peter Klügl
>            Assignee: Peter Klügl
>             Fix For: 2.0.1TextMarker
>
>
> TextMarker is not applicable in use cases with very large artifacts, e.g., documents
with 500k - 1M tokens.
> Adapt or exchange the rule language to allow the user to handle such texts:
> - reduce the memory profile of TextMarkerBasic inference annotations, make it configurable
respectively.
> - add the concept of simple rules that match only on a single regular expression for
adding annotations without inference annotations (related to UIMA-2331).
> - allow the user to skip seeding at the startup of the engine and to apply the seeders
on certain annotations within rule inference.
> - introduce language concepts that enable the user to split documents into multiple CASs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message