uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Oleg Fedoriaka (JIRA)" <...@uima.apache.org>
Subject [jira] [Updated] (UIMA-4453) MARKTABLE action works improperly
Date Wed, 10 Jun 2015 07:50:00 GMT

     [ https://issues.apache.org/jira/browse/UIMA-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Oleg Fedoriaka updated UIMA-4453:
---------------------------------
      Description: 
New available UIMA Ruta Runtime 2.7.0 & Workbench 2.3.0 for Eclipse has lost proper functionality
of MARKTABLE action.  This action stopped annotating of all words from a csv file. I had noticed
that the problem happened only for words written in Cyrillic witch contains spaces, i.e. for
Latin it works fine. Please use sample outlined below in order to reproduce the problem i'm
talking about.

# script/main.ruta
WORDTABLE Dict = 'dict.csv';
DECLARE Annotation Test (STRING meaning);
Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};

# resources/dict.csv
від;from
с какой стати;why
с которой;fromWhich
сюда;here
по какому;which
сюди;here
как нибудь;somehow
сколько;howMuch

# input/test.txt
від с какой стати с которой сюда по какому сюди как
нибудь сколько

After main.ruta script execution we wont get annotated everything from test.txt Worth mentioning
that Cyrillic letter like 'с' at the beginning of string, somehow affecting on processing
behavior. Moreover, by removing lines with spaces, will get rid us from the issue described
above.

  was:
New available UIMA Ruta Runtime & Workbench 2.3 for Eclipse has lost proper functionality
of MARKTABLE action. 

This action stopped annotating of all words from a csv file. I had noticed that the problem
happened only for 

words written in Cyrillic witch contains spaces, i.e. for Latin it works fine. Please use
sample outlined below 

in order to reproduce the problem i'm talking about.

# script/main.ruta
WORDTABLE Dict = 'dict.csv';
DECLARE Annotation Test (STRING meaning);
Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};

# resources/dict.csv
від;from
с какой стати;why
с которой;fromWhich
сюда;here
по какому;which
сюди;here
как нибудь;somehow
сколько;howMuch

# input/test.txt
від с какой стати с которой сюда по какому сюди как
нибудь сколько

After main.ruta script execution we wont get annotated everything from test.txt Worth mentioning
that Cyrillic 

Cyrillic letter like 'с' at the beginning of string, somehow affecting on processing behavior.
Moreover, by 

removing lines with spaces, will get rid us from the issue described above.

    Fix Version/s:     (was: 2.3.1ruta)
                   2.2.0ruta

> MARKTABLE action works improperly
> ---------------------------------
>
>                 Key: UIMA-4453
>                 URL: https://issues.apache.org/jira/browse/UIMA-4453
>             Project: UIMA
>          Issue Type: Bug
>          Components: ruta
>    Affects Versions: 2.3.0ruta
>         Environment: OS X 10.9.1, Java v8u45, Eclipse Luna
> Windows 7, Java v8u45, Eclipse Luna
>            Reporter: Oleg Fedoriaka
>            Assignee: Peter Klügl
>             Fix For: 2.2.0ruta
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> New available UIMA Ruta Runtime 2.7.0 & Workbench 2.3.0 for Eclipse has lost proper
functionality of MARKTABLE action.  This action stopped annotating of all words from a csv
file. I had noticed that the problem happened only for words written in Cyrillic witch contains
spaces, i.e. for Latin it works fine. Please use sample outlined below in order to reproduce
the problem i'm talking about.
> # script/main.ruta
> WORDTABLE Dict = 'dict.csv';
> DECLARE Annotation Test (STRING meaning);
> Document {-> MARKTABLE(Test,1,Dict, "meaning" = 2)};
> # resources/dict.csv
> від;from
> с какой стати;why
> с которой;fromWhich
> сюда;here
> по какому;which
> сюди;here
> как нибудь;somehow
> сколько;howMuch
> # input/test.txt
> від с какой стати с которой сюда по какому сюди
как нибудь сколько
> After main.ruta script execution we wont get annotated everything from test.txt Worth
mentioning that Cyrillic letter like 'с' at the beginning of string, somehow affecting on
processing behavior. Moreover, by removing lines with spaces, will get rid us from the issue
described above.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message