tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1771) lower magic priority xhtml magic priority to ensure emails detected as message/rfc822
Date Sun, 18 Oct 2015 19:22:05 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Chris A. Mattmann resolved TIKA-1771.
-------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.11

Thanks [~jeremybmerrill]! 
{noformat}
[chipotle:~/tmp/tika1.11] mattmann% svn commit -m "Fix for TIKA-1771 lower magic priority
xhtml magic priority to ensure emails detected as message/rfc822 contributed by Jeremy B.
Merrill <jeremy.merrill@nytimes.com> this closes #58."
Sending        CHANGES.txt
Sending        tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Transmitting file data ..
Committed revision 1709301.
[chipotle:~/tmp/tika1.11] mattmann% 
{noformat}


> lower magic priority xhtml magic priority to ensure emails detected as message/rfc822
> -------------------------------------------------------------------------------------
>
>                 Key: TIKA-1771
>                 URL: https://issues.apache.org/jira/browse/TIKA-1771
>             Project: Tika
>          Issue Type: Improvement
>          Components: detector
>            Reporter: Jeremy B. Merrill
>            Assignee: Chris A. Mattmann
>            Priority: Critical
>             Fix For: 1.11
>
>
> Emails I have (happy to share if you want) contain XHTML, as one part of a multipart
email. Prior to this pull request, the priority on the application/xhtml+xml magic detector
was 50, equal to the priority on the message/rfc822 detector. Because of the relative position
of the two detectors in tika-mimetypes.xml, the emails were incorrectly detected as XHTML
documents.
> With this PR, by downgrading the priority of application/xhtml+xml to 40, the more-sensitive
email magic detectors take precedence, causing the emails to be properly detected as message/rfc822.
> I have not run this thru the govdocs tester or anything other than my own documents,
so, full disclosure, this could cause false negative xhtml-detections elsewhere.
> I should note this occurs on trunk, from Github, up-to-date as of Tuesday-ish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message