uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Rocheteau (JIRA) <uima-...@incubator.apache.org>
Subject [jira] Updated: (UIMA-1447) Tabulations are annotated as tokens after a space
Date Thu, 23 Jul 2009 14:47:17 GMT

     [ https://issues.apache.org/jira/browse/UIMA-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jérôme Rocheteau updated UIMA-1447:

    Attachment: patch-an-wst.txt

I suggest this patch: it merely checks if the current character isn't a whitespace while creating
a token annotation is created for a special character.

> Tabulations are annotated as tokens after a space
> -------------------------------------------------
>                 Key: UIMA-1447
>                 URL: https://issues.apache.org/jira/browse/UIMA-1447
>             Project: UIMA
>          Issue Type: Bug
>          Components: Sandbox-WhitespaceTokenizer
>    Affects Versions: 2.3S
>         Environment: Unix (ubuntu 8.04), Eclipse Galileo 3.5
>            Reporter: Jérôme Rocheteau
>         Attachments: patch-an-wst.txt
> This is a test-text for the Whitespace Tokenizer in the UIMA Sandbox. 
> It behaves as follows: 	i.e. a '\t' character after a space is 
> annotated as a token and its covered text is set to the empty string ""! 
> I suppose it shoudn't be the case, am I wrong?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message