uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jasper Huzen (JIRA)" <...@uima.apache.org>
Subject [jira] [Created] (UIMA-5775) Performance problem MARKTABLE when matching case insensitive
Date Mon, 14 May 2018 12:46:00 GMT
Jasper Huzen created UIMA-5775:

             Summary: Performance problem MARKTABLE when matching case insensitive
                 Key: UIMA-5775
                 URL: https://issues.apache.org/jira/browse/UIMA-5775
             Project: UIMA
          Issue Type: Bug
          Components: Ruta
    Affects Versions: 2.6.1ruta
            Reporter: Jasper Huzen


We encounter a performance issue (or maybe infinitive loop) when we use the MARKTABLE action,
with case insenstive valuelists.

The call in our script is:
MARKTABLE(LawName, 1, 'nl_law_names.ignorecase.csv', true, 0, "", 0, "lawIdentifier" = 2);{code}

Using the following input fragment will result in a timeout exception after 1 minute.
Groenboek COM(2006) 105 definitief een Europese strategie voor duurzame, concurrerende en
continu geleverde energie voor Europa {SEC(2006)317}{code}
That complete name is a Dutch lawname and also be an entry of the _nl_law_names.csv_ file.

When we try to match it and we have the ignoreCase flag to false, it is no problem and fast..
If we toggle that flag to true (case is ignored), the matching is really slow or even hanging
in an infinitive loop.

I debugged the code and pinpoint me to the _TreeWordList_ class. The recursive method _recursiveContains_ have
a potential bug. 

I think that the problem is when the item have a special character, that it is the same character
in upper and lowercase. The recursive method will then look/fork twice on the same tree item.

I made a fix that check if the uppercase is the same character as the lowercase, and in that
case it only do the recursive call once. That solved the performance issue but I'm not sure
if this is really the main problem and the current fix is the best fix for this.

This message was sent by Atlassian JIRA

View raw message