uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin Toepfer (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions
Date Wed, 08 Jan 2014 15:23:55 GMT

    [ https://issues.apache.org/jira/browse/UIMA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865543#comment-13865543
] 

Martin Toepfer commented on UIMA-3530:
--------------------------------------

I've also been thinking about such a feature (for dealing with German inflection) -- maybe
we can find a quick fix.

I've had a look at the source code and agree with Peter that a solution for full-featured
regular expressions is not that simple. Nevertheless, how would you think about something
like a template mechanism? For example:

A dictionary with the entry "kalt$$" could be called from within Ruta like

  Document {->MARKFAST(ADJ, adjList, ..., "$$"=>("","e","er","es","en"))};

which should add "kalt", "kalte", "kalter", "kaltes", "kalten" to the trie.

Would that be applicable to your Greek or Russian dictionaries?

(A collegue of mine once used this for modeling adjectives in German terminologies).

In the end, maybe one should instead think about using stemming or lemmatization (if possible).
Or you could wrap the wordlist creation with your own code.

-- Martin

> UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched
but also regular expressions 
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3530
>                 URL: https://issues.apache.org/jira/browse/UIMA-3530
>             Project: UIMA
>          Issue Type: Wish
>          Components: ruta
>            Reporter: Dimitris Vassos
>            Priority: Minor
>
> It would greatly speed up and simplify the implementation of dictionary lookups using
WORDLIST and WORDTABLE, if instead of just plain text entries in the file we could enter regular
expressions.
> Especially for inflectional languages such as Greek or Russian, this feature is invaluable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message