tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-887) Tika fails to parse some MP3 tags correctly and produces null characters in value
Date Mon, 04 Feb 2013 22:34:13 GMT

    [ https://issues.apache.org/jira/browse/TIKA-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570705#comment-13570705
] 

Nick Burch commented on TIKA-887:
---------------------------------

I've just tried with the most recent build of tika from SVN, and I'm not seeing any random
control characters turn up. I think therefore that the work on the MP3 parser over the last
year has solved it

Any chance you could double check yourself, and close the ticket if it's now behaving?
                
> Tika fails to parse some MP3 tags correctly and produces null characters in value
> ---------------------------------------------------------------------------------
>
>                 Key: TIKA-887
>                 URL: https://issues.apache.org/jira/browse/TIKA-887
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0, 1.1
>            Reporter: Jens Hübel
>            Priority: Minor
>
> I have a problem when extracting the comment tag from an MP3 file. It contains an invalid
prefix then a '\0' character and then the real value of the tag. This happpens with files
downloaded from www.jamendo.com, for example this one:
> http://storage.newjamendo.com/download/track/450545/mp32/Swansong.mp3
> It may be that the tags are not created properly on this site, but at least tools like
mp3tag display them correctly.
> The extracted value looks like this: eng http://www.jamendo.com Attribution-Noncommercial-Share
Alike 3.0
> At position 3 there is a null character. The tag value should start with http...
> Here is the byte sequence at the beginning of this file:
> 49 44 33 04 00 00 00 01 18 32 54 49 54 32 00 00 
> 00 09 00 00 03 53 77 61 6E 73 6F 6E 67 54 50 45 
> 31 00 00 00 0E 00 00 03 4A 6F 73 68 20 57 6F 6F 
> 64 77 61 72 64 54 41 4C 42 00 00 00 0C 00 00 03 
> 42 72 65 61 64 63 72 75 6D 62 73 54 44 52 4C 00 
> 00 00 05 00 00 03 32 30 30 39 43 4F 4D 4D 00 00 
> 00 22 00 00 03 65 6E 67 49 44 33 20 76 31 20 43 
> 6F 6D 6D 65 6E 74 00 41 74 74 72 69 62 75 74 69 
> 6F 6E 20 33 2E 30 54 43 4F 4E 00 00 00 06 00 00 
> 03 28 32 35 35 29 54 50 55 42 00 00 00 08 00 00 
> 03 4A 61 6D 65 6E 64 6F 43 4F 4D 4D 00 00 00 2C 
> 00 00 03 65 6E 67 00 68 74 74 70 3A 2F 2F 77 77 
> 77 2E 6A 61 6D 65 6E 64 6F 2E 63 6F 6D 20 41 74 
> 74 72 69 62 75 74 69 6F 6E 20 33 2E 30 20 54 43 
> 4F 50 00 00 01 1F 00 00 03 32 30 30 39 2D 31 30 
> 2D 32 31 54 31 31 3A 31 31 3A 32 30 2B 30 31 3A 
> 30 30 20 4A 6F 73 68 20 57 6F 6F 64 77 61 72 64 
> 2E 20 4C 69 63 65 6E 73 65 64 20 74 6F 20 74 68
> ID3......2TIT2.......SwansongTPE1.......Josh WoodwardTALB.......BreadcrumbsTDRL.......2009COMM..."...engID3
v1 Comment.Attribution 3.0TCON.......(255)TPUB.......JamendoCOMM...,...eng.http://www.jamendo.com
Attribution 3.0 TCOP.......2009-10-21T11:11:20+01:00 Josh Woodward. Licensed to th

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message