tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2455) Flag in metadata for alternative email bodies
Date Wed, 27 Sep 2017 10:45:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16182354#comment-16182354

ASF GitHub Bot commented on TIKA-2455:

mattcg commented on issue #205: TIKA-2455: flag the containing multipart type
URL: https://github.com/apache/tika/pull/205#issuecomment-332482405
   @tballison updated with a test and also:
   1) only store the multipart subtype (the entire type is redundant);
   2) store the root multipart type when parsing the fields of the main message.
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

> Flag in metadata for alternative email bodies
> ---------------------------------------------
>                 Key: TIKA-2455
>                 URL: https://issues.apache.org/jira/browse/TIKA-2455
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.16
>            Reporter: Matthew Caruana Galizia
>            Priority: Minor
>              Labels: attachments, multipart, rfc822, rfc822parser
> When multipart RFC822 emails are being parsed, there's no way to distinguish between
alternative versions of the body and attachments.
> It would be ideal if some kind of flag were set in the metadata passed to the {{EmbeddedDocumentExtractor}}
that indicates that the stream is an alternative.
> In GUIs that present the data extracted from the email, alternative bodies can be distinguished
from attachments and presented separately.

This message was sent by Atlassian JIRA

View raw message