uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pepi Stavropoulou (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-3530) UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched but also regular expressions
Date Thu, 09 Jan 2014 14:16:55 GMT

    [ https://issues.apache.org/jira/browse/UIMA-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13866666#comment-13866666
] 

Pepi Stavropoulou commented on UIMA-3530:
-----------------------------------------

Many thanks for the workaround suggestion.
If I understand this correctly, it is not exactly what we are looking for, as we need to map
this "$$" placeholder to different endings depending on the lemma type/ word form. So we would
need different placeholders to be mapped to different ending sets.
E.g.
"kalt$TemplateA$" where "$TemplateA$" =>("","e","er","es","en"))
"spiel$TemplateB$"  where "$TemplateB$" =>("e","st","st","en"))

Would it be possible as a temp solution to use reg expressions in the dictionary, expand them
into separate entries as a preprocessing step, and then continue with building the trie as
usual?
For example, regex kalt(e|er|es|en)? would be expanded to different entries kalt, kalte, kalter
etc sharing the same features.
I understand it can be time and memory consuming, but they would be simple regexs possibly
with no *, + operators allowed.

> UIMA Rute - allow WORDLIST and WORDTABLE files to include not just plain text to be matched
but also regular expressions 
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: UIMA-3530
>                 URL: https://issues.apache.org/jira/browse/UIMA-3530
>             Project: UIMA
>          Issue Type: Wish
>          Components: ruta
>            Reporter: Dimitris Vassos
>            Priority: Minor
>
> It would greatly speed up and simplify the implementation of dictionary lookups using
WORDLIST and WORDTABLE, if instead of just plain text entries in the file we could enter regular
expressions.
> Especially for inflectional languages such as Greek or Russian, this feature is invaluable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message