uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Created] (UIMA-2397) TextMarker: Improve overall functionality in use cases with very large artifacts
Date Fri, 04 May 2012 12:02:48 GMT
Peter Klügl created UIMA-2397:
---------------------------------

             Summary: TextMarker: Improve overall functionality in use cases with very large
artifacts
                 Key: UIMA-2397
                 URL: https://issues.apache.org/jira/browse/UIMA-2397
             Project: UIMA
          Issue Type: Improvement
          Components: TextMarker
            Reporter: Peter Klügl
            Assignee: Peter Klügl


TextMarker is not applicable in use cases with very large artifacts, e.g., documents with
500k - 1M tokens.
Adapt or exchange the rule language to allow the user to handle such texts:
- reduce the memory profile of TextMarkerBasic inference annotations, make it configurable
respectively.
- add the concept of simple rules that match only on a single regular expression for adding
annotations without inference annotations (related to UIMA-2331).
- allow the user to skip seeding at the startup of the engine and to apply the seeders on
certain annotations within rule inference.
- introduce language concepts that enable the user to split documents into multiple CASs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message