tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-2047) TXTParser overwrites mime type/masks types that are subtype of text
Date Fri, 05 Aug 2016 12:11:20 GMT
Tim Allison created TIKA-2047:
---------------------------------

             Summary: TXTParser overwrites mime type/masks types that are subtype of text
                 Key: TIKA-2047
                 URL: https://issues.apache.org/jira/browse/TIKA-2047
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.13
            Reporter: Tim Allison
            Assignee: Tim Allison
            Priority: Minor


For vcal and other mime types that are subclasses of {{text/plain}}, the TXTParser overwrites
their mime type as "text/plain".  We should check to see what mime has been sent in via the
Metadata and add the charset to that, e.g. "text/calendar; charset=ISO-8859-1"...right?

{noformat}
            Charset charset = reader.getCharset();
            MediaType type = new MediaType(MediaType.TEXT_PLAIN, charset);
            metadata.set(Metadata.CONTENT_TYPE, type.toString());
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message