uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Kottmann (JIRA) <...@uima.apache.org>
Subject [jira] Commented: (UIMA-1782) Encoding of text files during import should be confugurable
Date Mon, 17 May 2010 23:40:43 GMT

    [ https://issues.apache.org/jira/browse/UIMA-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12868448#action_12868448
] 

Jörn Kottmann commented on UIMA-1782:
-------------------------------------

There is now an option to specify the encoding of the text import files. It is always preset
to the default platform encoding. The combo box displays the Java standard charsets (see here:
http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/Charset.html).
In case the user wants to use a non-standard Java charset (which usually are there) he has
to type in the name of the charset he wants to use, while the name is typed in, it is validated
if the charset is available and he can proceed with the import, otherwise the "Apply"  button
just remains disabled. 

It would be nice to add a warning to tell the user that the "Apply" button is disable because
of an invalid charset name or unsupported charset.

> Encoding of text files during import should be confugurable
> -----------------------------------------------------------
>
>                 Key: UIMA-1782
>                 URL: https://issues.apache.org/jira/browse/UIMA-1782
>             Project: UIMA
>          Issue Type: Improvement
>          Components: CasEditor
>    Affects Versions: 2.3
>            Reporter: Thomas Hampp
>            Assignee: Jörn Kottmann
>             Fix For: 2.3.1
>
>
> During import of text files into a corpus it seems to be impossible to control the encoding
used. Looks like the default platform encoding is used (Latin 1 on Western Windows systems).
The Eclipse default encoding settings for text files don't seem to affect import encoding.
That makes it impossible to import documents with international characters in UTF8.
> Ideally the encoding should be selectable in a drop down field in the import wizard.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message