tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Turner (JIRA)" <j...@apache.org>
Subject [jira] Created: (TIKA-461) RFC822 messages not parsed
Date Thu, 08 Jul 2010 14:45:49 GMT
RFC822 messages not parsed
--------------------------

                 Key: TIKA-461
                 URL: https://issues.apache.org/jira/browse/TIKA-461
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.7
            Reporter: Joshua Turner


Presented with an RFC822 message exported from Thunderbird, AutodetectParser produces an empty
body, and a Metadata containing only one key-value pair: "Content-Type=message/rfc822". Directly
calling MboxParser likewise gives an empty body, but with two metadata pairs: "Content-Encoding=us-ascii
Content-Type=application/mbox".

A quick peek at the source of MboxParser shows that the implementation is pretty naive. If
the wiring can be sorted out, something like Apache James' mime4j might be a better bet.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message