tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Douglas (JIRA)" <j...@apache.org>
Subject [jira] Commented: (TIKA-570) If this is a BMP, my name is horatio alger
Date Sun, 12 Dec 2010 18:31:01 GMT

    [ https://issues.apache.org/jira/browse/TIKA-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12970626#action_12970626

Benjamin Douglas commented on TIKA-570:

What about this (from the Wikipedia article):

Offset: 0x1A
Purpose: the number of color planes being used. Must be set to 1.

This means that there is always a two byte 0x01 0x00 sequence at a specific offset toward
the beginning of the file. This is in the header, and granted there are different versions
of the header; but the description in the article makes it look like the majority of headers
have this, possibly modulo OS/2 flavors. The pattern 0x01 0x00 is not likely to appear in
most plain text, especially text that begins with ASCII. The BMP file in the unit tests has
this signature, for example.

> If this is a BMP, my name is horatio alger
> ------------------------------------------
>                 Key: TIKA-570
>                 URL: https://issues.apache.org/jira/browse/TIKA-570
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.8
>            Reporter: Benson Margulies
>         Attachments: C80A5295-EFC7-44DD-9A39-B882D1EC6F38.txt, C80A5295-EFC7-44DD-9A39-B882D1EC6F38.txt
> I am attaching a file which Tika is identifying as a bmp. It contains ordinary text.
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.image.ImageParser@20a19811
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:199)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:137)
> 	at com.basistech.jug.FileHarvester.process(FileHarvester.java:204)
> 	at com.basistech.jug.FileHarvester.harvestDir(FileHarvester.java:165)
> 	at com.basistech.jug.FileHarvester.harvestDir(FileHarvester.java:179)
> 	at com.basistech.jug.FileHarvester.harvest(FileHarvester.java:135)
> 	at com.basistech.jug.FileHarvester.run(FileHarvester.java:247)
> 	at java.lang.Thread.run(Thread.java:680)
> Caused by: java.lang.RuntimeException: New BMP version not implemented yet.
> 	at com.sun.imageio.plugins.bmp.BMPImageReader.readHeader(BMPImageReader.java:462)
> 	at com.sun.imageio.plugins.bmp.BMPImageReader.getWidth(BMPImageReader.java:174)
> 	at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:75)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> 	... 8 more

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message