tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-913) MagicMime detection of msdos executables does not work
Date Thu, 10 May 2012 11:41:53 GMT

    [ https://issues.apache.org/jira/browse/TIKA-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272258#comment-13272258
] 

Nick Burch commented on TIKA-913:
---------------------------------

If anyone wanted to add a parser for PE(32/64) files, then this doc should be handy: <http://msdn.microsoft.com/en-us/windows/hardware/gg463119.aspx>.
We should be able to get the odd common thing, like creation date, along with lots of other
info too

Based on this info, and the osdev page, I've added mime magic for what look to be the common
variants in r1336610.
                
> MagicMime detection of msdos executables does not work
> ------------------------------------------------------
>
>                 Key: TIKA-913
>                 URL: https://issues.apache.org/jira/browse/TIKA-913
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.1
>         Environment: Linux, JDK 1.6
>            Reporter: Torsten Krah
>              Labels: detection, magic, mime
>
> Mime detection does not work as expected (at least from me) in contrast e.g. to sourceforge
mime-util detection or "file" utility.
> For example using putty ms-dos executable does result in wrong detections:
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty
> application/octet-stream
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.jpg
> image/jpeg
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/putty.exe
> application/x-msdownload
> Its everytime the same binary resource only with different names.
> In contrast using "file" does output:
> krah@sf050:~$ file /tmp/putty
> /tmp/putty: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
> krah@sf050:~$ file /tmp/putty.jpg
> /tmp/putty.jpg: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
> krah@sf050:~$ file /tmp/putty.exe
> /tmp/putty.exe: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
> So magic mime detection should be able to detect that this is actually an executable.
> E.g. for a PDF it does work:
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.pdf
> application/pdf
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print
> application/pdf
> krah@sf050:~$ java -jar /tmp/tika-app-1.1.jar --detect /tmp/print.jpg 
> application/pdf
> Here Tika detects what is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message