uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column
Date Tue, 20 Feb 2018 09:01:00 GMT

    [ https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16369827#comment-16369827
] 

Peter Klügl commented on UIMA-5723:
-----------------------------------

I actually answered a question about the same problem to someone else off list last week.
Take a look at the filtering setting or the filtered chars in the MARKTABLE action. The match/lookup
is more flexible than the lookup for the feature. The match need to represent the same string
as mentioned in the row for the feature. Actually the row that matched during the dictionary
lookup.

 

There are several problems and flaws witht he Ruta wordlists and wordtables which cause problems
all the time, also because they are more powerful than similar dictionary lookups. In order
to avoid that, I wrote some simple dictionary lookup code which fixes exactly those flaws
but it is not compatible with the ruta code, and is much simplier and more maintainable. Now,
I do not use the ruta functionality at all, I see it as deprectated actually, but only my
simple dictionary. I will contribute the code when I find the time, but I also need to find
a good design how to include it in Ruta.

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---------------------------------------------------------------------------
>
>                 Key: UIMA-5723
>                 URL: https://issues.apache.org/jira/browse/UIMA-5723
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Andreas Thiel
>            Assignee: Peter Klügl
>            Priority: Major
>
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like this
> {code:xml}
> WAZ;WAZELF
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature of the
resulting annotation is not filled by the string following the semicolon. Instead, it remains
empty.
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering zelfstandigen}}
is detected and processed as expected with feature WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or uppercase
letters following lowercase chars) is present in the first column in the CSV file? Or is it
a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding uppercase/lowercase
distinction, but to no avail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message