uima-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Rocheteau (JIRA) <uima-...@incubator.apache.org>
Subject [jira] Updated: (UIMA-1502) Using getSofaDataStream instead of getDocumentText
Date Wed, 19 Aug 2009 13:52:14 GMT

     [ https://issues.apache.org/jira/browse/UIMA-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jérôme Rocheteau updated UIMA-1502:

    Attachment: wst.patch

This is a patch that makes possible to use the Whitespace tokenizer whatever the way sofas
are set through collection readers.

> Using getSofaDataStream instead of getDocumentText
> --------------------------------------------------
>                 Key: UIMA-1502
>                 URL: https://issues.apache.org/jira/browse/UIMA-1502
>             Project: UIMA
>          Issue Type: Improvement
>          Components: Sandbox-WhitespaceTokenizer
>            Reporter: Jérôme Rocheteau
>            Priority: Minor
>         Attachments: wst.patch
>   Original Estimate: 0.17h
>  Remaining Estimate: 0.17h
> I would like to known if it could be better to get the CAS text content by calling the
getSofaDataStream method of the CAS class instead of getting it by the getDocumentText one.
> Actually, CAS sofas can be set either by calling the setSofaDataString method (aka setDocumentText),
or by calling the setSofaDataArray one, or by calling the setSofaDataURI one. However, the
getDocumentText method (aka getSofaDataString) provides the content of CASes whose sofas are
only set by the first method whereas the getSofaDataStream method retieves content whatever
the called method. A method able to get String from an InputStream is then needed.
> Am I wrong in thinking it's an Improvement?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message