uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Assigned] (UIMA-5775) Performance problem MARKTABLE when matching case insensitive
Date Mon, 14 May 2018 13:01:00 GMT

     [ https://issues.apache.org/jira/browse/UIMA-5775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Peter Klügl reassigned UIMA-5775:

    Assignee: Peter Klügl

> Performance problem MARKTABLE when matching case insensitive
> ------------------------------------------------------------
>                 Key: UIMA-5775
>                 URL: https://issues.apache.org/jira/browse/UIMA-5775
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Jasper Huzen
>            Assignee: Peter Klügl
>            Priority: Major
>             Fix For: 2.6.2ruta
>         Attachments: UIMA-5775.patch
> Hi,
> We encounter a performance issue (or maybe infinitive loop) when we use the MARKTABLE
action, with case insenstive valuelists.
> The call in our script is:
> {code:java}
> MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0, "lawIdentifier"
= 2);{code}
> Using the following input fragment will result in a timeout exception after 1 minute.
> {code:java}
> Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame, concurrerende
en continu geleverde energie voor Europa {SEC(2006)317}{code}
> That complete name is a Dutch lawname and also be an entry of the _nl_law_names.csv_
> When we try to match it and we have the ignoreCase flag to false, it is no problem and
fast.. If we toggle that flag to true (case is ignored), the matching is really slow or even
hanging in an infinitive loop.
> I debugged the code and pinpoint me to the _TreeWordList_ class. The recursive method
_recursiveContains_ have a potential bug. 
> I think that the problem is when the item have a special character, that it is the same
character in upper and lowercase. The recursive method will then look/fork twice on the same
tree item.
> I made a fix that checks if the uppercase character is the same as the lowercase character,
and in that case it only do the recursive call once. That solved the (performance) issue but
I'm not sure if this is really the main problem and the current fix is the best fix for this.

This message was sent by Atlassian JIRA

View raw message