tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Boris Naguet (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1175) MS Money files wrongly detected as True Type Font
Date Tue, 01 Oct 2013 10:01:26 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782777#comment-13782777
] 

Boris Naguet commented on TIKA-1175:
------------------------------------

I have no "open-sourced" file to provide here, but you can test it with files found on internet:
http://filemare.com/search/*.mny
They all have this magic.

> MS Money files wrongly detected as True Type Font
> -------------------------------------------------
>
>                 Key: TIKA-1175
>                 URL: https://issues.apache.org/jira/browse/TIKA-1175
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.3, 1.4
>            Reporter: Boris Naguet
>            Priority: Minor
>
> TTF magic is probably not specific enough, because it incorrectly detect MS Money files
as TTF files, and then the parsing generates an Exception.
> {quote}
> Caused by: ! java.io.IOException: head is mandatory
> ! at org.apache.fontbox.ttf.AbstractTTFParser.parseTables(AbstractTTFParser.java:107)

> {quote}
> Here is the magic detection code that I added to {{custom-mimetypes.xml}}, and solves
it:
> {code:xml}
> <mime-info>
> 	<mime-type type="application/x-msmoney">
> 		<glob pattern="*.mny" />
> 		<magic priority="60">
> 			<match value="0x000100004D534953414D204461746162617365" type="string" offset="0"
/>
> 		</magic>
> 	</mime-type>
> {code}
> It can replace the existing {{application/x-msmoney}} empty mime-type in {{tika-mimetypes.xml}}.
> magic comes from
> http://filesignatures.net/index.php?search=mny&mode=EXT



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message