tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jürgen Enge (JIRA) <j...@apache.org>
Subject [jira] [Updated] (TIKA-1184) Infinite halt on parsing old files (e.g. mp3, ms-dos drivers, ...)
Date Mon, 21 Oct 2013 06:56:43 GMT

     [ https://issues.apache.org/jira/browse/TIKA-1184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jürgen Enge updated TIKA-1184:
------------------------------

    Description: 
tika hangs on identifying several types of files. the following example is an mp3 file with
corrupt metadata. other filetypes which have the same problem are for example MSDOS device
drivers (*.sys)
i am not into java programming, but my guess would be, that tika is trying to seek() within
a file and the target position is greater than filesize. 

> java -jar tika-app-1.4.jar -m /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
[hangs forever without error message]

ffmpeg gives some warnings about duration errors...
> ffmpeg -i /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
[mp3 @ 0x633240] max_analyze_duration 5000000 reached at 5015510
[mp3 @ 0x633240] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e':
  Metadata:
    artist          : 
    album           : 
  Duration: 00:15:29.10, start: 0.000000, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s



  was:
tika hangs on identifying several types of files. the following example is an mp3 file with
corrupt metadata. other filetypes which have the same problem are for example MSDOS device
drivers (*.sys)
i am not into java programming, but my guess would be, that tika is trying to seek() within
a file and the target position is greater than filesize. 

> java -jar tika-app-1.4.jar -m /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
[hangs forever without error message]

ffmpeg gives some warnings about duration errors...
> ffmpeg -i /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
[mp3 @ 0x633240] max_analyze_duration 5000000 reached at 5015510
[mp3 @ 0x633240] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e':
  Metadata:
    artist          : Jacques Lacan
    album           : UbuWeb / PennSound Archive
  Duration: 00:15:29.10, start: 0.000000, bitrate: 192 kb/s
    Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s




> Infinite halt on parsing old files (e.g. mp3, ms-dos drivers, ...)
> ------------------------------------------------------------------
>
>                 Key: TIKA-1184
>                 URL: https://issues.apache.org/jira/browse/TIKA-1184
>             Project: Tika
>          Issue Type: Bug
>          Components: cli, parser
>    Affects Versions: 1.4
>         Environment: SUSE Linux Enterprise Server 11 SP3  (x86_64)
> java version "1.7.0"
> Java(TM) SE Runtime Environment (build pxa6470sr4fp2-20130426_01(SR4 FP2))
> IBM J9 VM (build 2.6, JRE 1.7.0 Linux amd64-64 Compressed References 20130422_146026
(JIT enabled, AOT enabled)
> J9VM - R26_Java726_SR4_FP2_20130422_1320_B146026
> JIT  - r11.b03_20130131_32403ifx4
> GC   - R26_Java726_SR4_FP2_20130422_1320_B146026_CMPRSS
> J9CL - 20130422_146026)
> JCL - 20130425_01 based on Oracle 7u21-b09
>            Reporter: Jürgen Enge
>
> tika hangs on identifying several types of files. the following example is an mp3 file
with corrupt metadata. other filetypes which have the same problem are for example MSDOS device
drivers (*.sys)
> i am not into java programming, but my guess would be, that tika is trying to seek()
within a file and the target position is greater than filesize. 
> > java -jar tika-app-1.4.jar -m /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
> [hangs forever without error message]
> ffmpeg gives some warnings about duration errors...
> > ffmpeg -i /u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e
> [mp3 @ 0x633240] max_analyze_duration 5000000 reached at 5015510
> [mp3 @ 0x633240] Estimating duration from bitrate, this may be inaccurate
> Input #0, mp3, from '/u01/fk/xd/2/c/16866bc96e6a316d8cbdbd7ca2ce1e':
>   Metadata:
>     artist          : 
>     album           : 
>   Duration: 00:15:29.10, start: 0.000000, bitrate: 192 kb/s
>     Stream #0:0: Audio: mp3, 44100 Hz, stereo, s16, 192 kb/s



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message