tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Ricardo (JIRA) <j...@apache.org>
Subject [jira] Updated: (TIKA-474) Tika parsing corrupt mp3
Date Fri, 06 Aug 2010 14:45:16 GMT

     [ https://issues.apache.org/jira/browse/TIKA-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

André Ricardo updated TIKA-474:
-------------------------------

    Attachment: test.mp3

This is "A corrupt MP3 file that has been truncated half way through the ID3v2 frames" from
the Nutch 0.9/1.0 sample mp3s files.

> Tika parsing corrupt mp3
> ------------------------
>
>                 Key: TIKA-474
>                 URL: https://issues.apache.org/jira/browse/TIKA-474
>             Project: Tika
>          Issue Type: Improvement
>          Components: cli
>    Affects Versions: 0.7
>         Environment: Linux Mandriva 2010 based OS (Linux Caixa Mágica 15)
>            Reporter: André Ricardo
>         Attachments: test.mp3
>
>
> I was trying some mp3s in tika-app cli coming from Nutch 0.9/1.0 samples and with "A
corrupt MP3 file that has been truncated half way through the ID3v2 frames" returned this:
> $ java -jar tika-app-0.7.jar -v -m ~/nutch-0.9/src/plugin/parse-mp3/sample/test.mp3
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.mp3.Mp3Parser@1bf3d87
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:138)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:99)
>     at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:169)
>     at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:62)
> Caused by: java.io.IOException: Tried to read 259186 bytes, but only 65526 bytes present
>     at org.apache.tika.parser.mp3.ID3v2Frame.readFully(ID3v2Frame.java:160)
>     at org.apache.tika.parser.mp3.ID3v2Frame.<init>(ID3v2Frame.java:110)
>     at org.apache.tika.parser.mp3.ID3v2Frame.createFrameIfPresent(ID3v2Frame.java:81)
>     at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:128)
>     at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:64)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:132)
>     ... 3 more
> Also tried with the latest trunk from github reproducing the problem:
> $ java -jar tika-app-0.8-SNAPSHOT.jar -v -m ~/nutch-0.9/src/plugin/parse-mp3/sample/test.mp3
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal
IOException from org.apache.tika.parser.mp3.Mp3Parser@e79839
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:169)
>     at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:110)
>     at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:193)
>     at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:72)
> Caused by: java.io.IOException: Tried to read 259186 bytes, but only 65526 bytes present
>     at org.apache.tika.parser.mp3.ID3v2Frame.readFully(ID3v2Frame.java:160)
>     at org.apache.tika.parser.mp3.ID3v2Frame.<init>(ID3v2Frame.java:110)
>     at org.apache.tika.parser.mp3.ID3v2Frame.createFrameIfPresent(ID3v2Frame.java:81)
>     at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:133)
>     at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:64)
>     at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:163)
>     ... 3 more
> The mp3 is here: http://github.com/apache/nutch/raw/tags/release-1.0/src/plugin/parse-mp3/sample/test.mp3
> All the other mp3 samples were parsed well by Tika.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message