[ https://issues.apache.org/jira/browse/UIMA-5680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Klügl updated UIMA-5680:
------------------------------
Fix Version/s: 2.6.2ruta
> Special characters in MARKFAST dictionaries mask entries
> --------------------------------------------------------
>
> Key: UIMA-5680
> URL: https://issues.apache.org/jira/browse/UIMA-5680
> Project: UIMA
> Issue Type: Bug
> Components: Ruta
> Affects Versions: 2.6.1ruta
> Reporter: Hugues de Mazancourt
> Assignee: Peter Klügl
> Priority: Major
> Fix For: 2.6.2ruta
>
> Attachments: Slash.ruta, dict.txt, text.txt
>
>
> It seems that two entries in MARKFAST dictionary simply differing from a special character
make MARKFAST ignore some entries :
> My script is:
> DECLARE AndOr;
> Document{->MARKFAST(AndOr, 'dict.txt', true)};
> My dict.txt contains
> and/or
> and or
> On the following text : "knowledge of java and/or php and or Groovy is a plus", only
the second "and or" (without the slash) is marked. If I remove the "unslashed" entry from
the dict.txt file, "and/or" is correctly marked.
> This also happens with other separators, such as "+", ".", etc. and even if two entries
share the same prefix. For example, if you add "and/or php" to dict.txt, it won't be marked.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
|