uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Thiel (JIRA)" <...@uima.apache.org>
Subject [jira] [Commented] (UIMA-5723) MARKTABLE fails to assign feature for single word entry in first CSV column
Date Tue, 20 Feb 2018 16:52:02 GMT

    [ https://issues.apache.org/jira/browse/UIMA-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16370252#comment-16370252

Andreas Thiel commented on UIMA-5723:

I was finally able to pin down the factor which caused the misbehavior. The MARKTABLE started
to behave like expected when I created the CAS with setting the {{TypePriorities}} argument
of {{CasCreationUtils.createCas}} as returned by the standard {{TypePrioritiesFactory.createTypePriorities()}}.
Previously, this had been set to {{null}}, probably because of copying the code from some
place on the internet without understanding the role of the arguments. Well, to be honest,
I still don't understand the role of _TypePriorities_ and why they alter the outcome of the
feature assignment in MARKTABLE, but maybe you can explain that?

If you think that is the normal and expected behavior, I would opt for closing this ticket.

Regarding the possible replacement of wordtables, our system now really relies on the capability
of feature value assignment taken form the table, so whatever the substitution will be, please
consider that this capability is somehow retained.  

> MARKTABLE fails to assign feature for single word entry in first CSV column
> ---------------------------------------------------------------------------
>                 Key: UIMA-5723
>                 URL: https://issues.apache.org/jira/browse/UIMA-5723
>             Project: UIMA
>          Issue Type: Bug
>          Components: Ruta
>    Affects Versions: 2.6.1ruta
>            Reporter: Andreas Thiel
>            Assignee: Peter Kl├╝gl
>            Priority: Major
> When using Ruta's MARKTABLE action with a CSV file {{nl_law_names.csv}} like this
> {code:xml}
> Wet arbeidsongeschiktheidsverzekering zelfstandigen;WAZELF
> {code}
> and corresponding Ruta script containing these lines
> {code:java}
> WORDTABLE LawNameTable = 'nl_law_names.csv';
> Document{->MARKTABLE(WetNaam, 1, LawNameTable, "WetIdentifier" = 2)};
> {code}
> it seems that the text {{WAZ}} is detected, but the {{WetIdentifier}} feature of the
resulting annotation is not filled by the string following the semicolon. Instead, it remains
> (Note: _WetNaam_ annotation is defined elsewhere via type system description)
> In contrast, the fully written name {{Wet arbeidsongeschiktheidsverzekering zelfstandigen}}
is detected and processed as expected with feature WetIdentifier = WAZELF after annnotating.
> Could it be that problems arise when only a single word (i.e. no spaces or uppercase
letters following lowercase chars) is present in the first column in the CSV file? Or is it
a matter of configuration?
> We experimented also with the optional arguments of MARKTABLE regarding uppercase/lowercase
distinction, but to no avail.

This message was sent by Atlassian JIRA

View raw message