uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Klügl (JIRA) <...@uima.apache.org>
Subject [jira] [Comment Edited] (UIMA-4453) MARKTABLE action works improperly
Date Tue, 30 Jun 2015 07:28:07 GMT

    [ https://issues.apache.org/jira/browse/UIMA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14607596#comment-14607596
] 

Peter Klügl edited comment on UIMA-4453 at 6/30/15 7:27 AM:
------------------------------------------------------------

Yes, the additional configuration parameter is set to false by default and needs to be activated.
In your use case, the best approach to do that is probably changing its value in the BasicEngine.xml.
This descriptor is applied for generating all descriptors and therefore the values specified
there are reused.

To do this, you have to:
- open th file descriptor/BasicEngine.xml with the "Component Descriptor Editor" in Eclipse
- switch to the "Parameter Settings" tab
- select the parameter "dictRemoveWS" (normally left part) and set its value to true (normally
right part)
- save the descriptor and rebuild all descriptors, especially that one that is used to create
the actual analysis engine, e.g., by changing the rule file.

If there is no "dictRemoveWs" parameter, then you need to update your Ruta project, e.g.,
by UIMA Ruta -> Convert to UIMA Ruta project in the popup menu of a project.

There is a possibility that this basic descriptor is not applied for building the new descriptors.
In case the above does not work, you could try to set the parameter value directly in the
descriptor of your rule script. However, the value will be overridden when you store the script
file.





was (Author: pkluegl):
Yes, the additional configuration parameter is set to false by default and need to be activated.
In your use case, the best approach to do that is probably changing its value in the BasicEngine.xml.
This descriptor is applied for generating all descriptors and therefore the values specified
there are reused.

To do this, you have to:
- open th file descriptor/BasicEngine.xml with the "Component Descriptor Editor" in Eclipse
- switch to the "Parameter Settings" tab
- select the parameter "dictRemoveWS" (normally left part) and set its value to true (normally
right part)
- save the descriptor and rebuild all descriptors, especially that one that is used to create
the actual analysis engine, e.g., by changing the rule file.

If there is not "dictRemoveWs" parameter, then you need to update you Ruta project, e.g.,
by UIMA Ruta -> Concert to UIMA Ruta project in the popup menu of a project.

There is a possibility that this basic descriptor is not applied for building the new descriptors.
In case the above does not work, you could try to set the parameter value directly in the
descriptor of your rule script. However, the value will be overridden when you store the script
file.




> MARKTABLE action works improperly
> ---------------------------------
>
>                 Key: UIMA-4453
>                 URL: https://issues.apache.org/jira/browse/UIMA-4453
>             Project: UIMA
>          Issue Type: Bug
>          Components: ruta
>    Affects Versions: 2.3.0ruta
>         Environment: OS X 10.9.1, Java v8u45, Eclipse Luna
> Windows 7, Java v8u45, Eclipse Luna
>            Reporter: Oleg Fedoriaka
>            Assignee: Peter Klügl
>             Fix For: 2.3.1ruta
>
>         Attachments: ruta.zip
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> New available UIMA Ruta Runtime 2.7.0 & Workbench 2.3.0 for Eclipse has lost proper
functionality of MARKTABLE action.  This action stopped annotating of all words from a csv
file. I had noticed that the problem happened only for words written in Cyrillic witch contains
spaces, i.e. for Latin it works fine. Please use sample outlined below in order to reproduce
the problem i'm talking about.
> # script/main.ruta
> WORDTABLE Dict = 'dict.csv';
> DECLARE Annotation Test (STRING meaning);
> Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};
> # resources/dict.csv
> від;from
> с какой стати;why
> с которой;fromWhich
> сюда;here
> по какому;which
> сюди;here
> как нибудь;somehow
> сколько;howMuch
> # input/test.txt
> від с какой стати с которой сюда по какому сюди
как нибудь сколько
> After main.ruta script execution we wont get annotated everything from test.txt Worth
mentioning that Cyrillic letter like 'с' at the beginning of string, somehow affecting on
processing behavior. Moreover, by removing lines with spaces, will get rid us from the issue
described above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message