tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ray Gauss II (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TIKA-1179) A corrupt mp3 file can cause an infinite loop in Mp3Parser
Date Fri, 04 Oct 2013 19:22:47 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ray Gauss II resolved TIKA-1179.
--------------------------------

    Resolution: Cannot Reproduce
      Assignee: Ray Gauss II

I've just confirmed the described behavior in Tika 1.4, however, it appears the file is parsed
just fine in 1.5!

You can verify by downloading a 1.5 snapshot of {{tika-app}} ([current link|https://repository.apache.org/content/groups/snapshots/org/apache/tika/tika-app/1.5-SNAPSHOT/tika-app-1.5-20130927.201341-30.jar]),
running the app, i.e.:
{code}
java -jar tika-app-1.5-20130927.201341-30.jar
{code}
and dropping {{corrupt.mp3}} onto the app window.

> A corrupt mp3 file can cause an infinite loop in Mp3Parser
> ----------------------------------------------------------
>
>                 Key: TIKA-1179
>                 URL: https://issues.apache.org/jira/browse/TIKA-1179
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.4
>            Reporter: Marius Dumitru Florea
>            Assignee: Ray Gauss II
>             Fix For: 1.5
>
>         Attachments: corrupt.mp3
>
>
> I have a thread that indexes (among other things) files using Apache Sorl. This thread
hangs (still running but with no progress) when trying to extract meta data from the mp3 file
attached to this issue. Here are a couple of thread dumps taken at various moments:
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4617000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.commons.io.input.AutoCloseInputStream.close(AutoCloseInputStream.java:63)
> 	at org.apache.commons.io.input.AutoCloseInputStream.afterRead(AutoCloseInputStream.java:77)
> 	at org.apache.commons.io.input.ProxyInputStream.read(ProxyInputStream.java:99)
> 	at java.io.BufferedInputStream.fill(Unknown Source)
> 	at java.io.BufferedInputStream.read1(Unknown Source)
> 	at java.io.BufferedInputStream.read(Unknown Source)
> 	- locked <0x00000000cb7094e8> (a java.io.BufferedInputStream)
> 	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> 	at java.io.FilterInputStream.read(Unknown Source)
> 	at org.apache.tika.io.TailStream.read(TailStream.java:117)
> 	at org.apache.tika.io.TailStream.skip(TailStream.java:140)
> 	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> 	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> 	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> 	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.Tika.parseToString(Tika.java:380)
> 	...
> {noformat}
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4618000]
>    java.lang.Thread.State: RUNNABLE
> 	at org.apache.tika.io.TailStream.skip(TailStream.java:133)
> 	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> 	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> 	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> 	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.Tika.parseToString(Tika.java:380)
> 	...
> {noformat}
> {noformat}
> "XWiki Solr index thread" daemon prio=10 tid=0x0000000003b72800 nid=0x64b5 runnable [0x00007f46f4617000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.BufferedInputStream.read1(Unknown Source)
> 	at java.io.BufferedInputStream.read(Unknown Source)
> 	- locked <0x00000000cb1be170> (a java.io.BufferedInputStream)
> 	at org.apache.tika.io.ProxyInputStream.read(ProxyInputStream.java:99)
> 	at java.io.FilterInputStream.read(Unknown Source)
> 	at org.apache.tika.io.TailStream.read(TailStream.java:117)
> 	at org.apache.tika.io.TailStream.skip(TailStream.java:140)
> 	at org.apache.tika.parser.mp3.MpegStream.skipStream(MpegStream.java:283)
> 	at org.apache.tika.parser.mp3.MpegStream.skipFrame(MpegStream.java:160)
> 	at org.apache.tika.parser.mp3.Mp3Parser.getAllTagHandlers(Mp3Parser.java:193)
> 	at org.apache.tika.parser.mp3.Mp3Parser.parse(Mp3Parser.java:71)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> 	at org.apache.tika.Tika.parseToString(Tika.java:380)
> 	...
> {noformat}
> This makes our Solr indexer very fragile as it prevents it from indexing other files
thus leading to incomplete search results.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message