tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2288) Remove metadata within body-element in OutlookExtractor
Date Tue, 07 Mar 2017 20:56:38 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15900140#comment-15900140

Nick Burch commented on TIKA-2288:

I've got a feeling that this was partly because we didn't have as-good metadata support at
the time, and partly so that the preview looked "more email like"

Maybe the fix would be to produce an "email view" content handler, which adds back in metadata
like this to the top of the body? That'd give the "looks like email" effect, would let current
Outlook users who need that keep it, and would let current Outlook users get the same behaviour
if they wanted for the other mail parsers too

> Remove metadata within body-element in OutlookExtractor
> -------------------------------------------------------
>                 Key: TIKA-2288
>                 URL: https://issues.apache.org/jira/browse/TIKA-2288
>             Project: Tika
>          Issue Type: Wish
>          Components: parser, server
>    Affects Versions: 1.14
>            Reporter: Sara Miller
>            Assignee: Tim Allison
>            Priority: Minor
> Tika's OutlookExtractor.java is not consistent with other mailparsers. 
> It would be nice to get the content of the mail in the body element in the same way as
other mailparsers. 
> Today, additional metadata such as sender, retriever, attachment is added to the body
> Source code: https://github.com/Silobreaker/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java#L190

This message was sent by Atlassian JIRA

View raw message