tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ken Krugler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-431) Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the charset part of the Content-Type header properly.
Date Mon, 12 Sep 2011 22:43:09 GMT

    [ https://issues.apache.org/jira/browse/TIKA-431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103144#comment-13103144
] 

Ken Krugler commented on TIKA-431:
----------------------------------

Hi Jan - sorry for the delay. Would end of week be soon enough?

-- Ken

> Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the
charset part of the Content-Type header properly.
> ---------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-431
>                 URL: https://issues.apache.org/jira/browse/TIKA-431
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>            Reporter: Erik Hetzner
>            Assignee: Ken Krugler
>
> Tika currently misuses the HTTP Content-Encoding header, and does not seem to use the
charset part of the Content-Type header properly.
> Content-Encoding is not for the charset. It is for values like gzip, deflate, compress,
or identity.
> Charset is passed in with the Content-Type. For instance: text/html; charset=iso-8859-1
> Tika should, in my opinion, do the following:
> 1. Stop using Content-Encoding, unless it wants me to be able to pass in gzipped content
in an input stream.
> 2. Parse and understand charset=... declarations if passed in the Metadata object
> 3. Return charset=... declarations in the Metadata object if a charset is detected.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message