tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-2761) XML Structured Text Is Missing Metadata Fields for mp3 files
Date Mon, 22 Oct 2018 18:04:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16659245#comment-16659245
] 

Hudson commented on TIKA-2761:
------------------------------

SUCCESS: Integrated in Jenkins build Tika-trunk #1583 (See [https://builds.apache.org/job/Tika-trunk/1583/])
TIKA-2761 -- write as much metadata as possible before writing to xhtml. (tallison: [https://github.com/apache/tika/commit/f7c3ece80e2db7e060deb0be3746d7dfa003303b])
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/mp3/Mp3ParserTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/mp3/Mp3Parser.java


> XML Structured Text Is Missing Metadata Fields for mp3 files
> ------------------------------------------------------------
>
>                 Key: TIKA-2761
>                 URL: https://issues.apache.org/jira/browse/TIKA-2761
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.19.1
>         Environment: All
>            Reporter: Nick Sincaglia
>            Assignee: Tim Allison
>            Priority: Minor
>             Fix For: 2.0.0, 1.20
>
>
> I am using the Tika 1.19 as a GUI to extract metadata from an .mp3 file. The sample rate
is available and I am able access it, but only as a string or as part of a JSON document.
I am working in XML and wold like to use XML as a content handler. But when the metadata is
returned as 'structured text' (XML) the sample rate is not returned. I have tried using Tika
1.19 in a Maven project and experimented with different contentHandlers  and the same issue
occurs. I cannot seem to get the sample rate returned in an XML doc, but I am able to access
the data from the metadata object itself. If the metadata is returned as a string, the sample
rate is there, if it is returned as XML, the sample rate is not returned. I am wondering what
I am doing wrong or misunderstanding. Perhaps an issue with the parser or contentHandler that
is used?
>  
> *_+Tika 1.19 'Metadata' view (sample rate is available):+_*
>  
> Author: Glee Cast
> Content-Length: 8251946
> Content-Type: audio/mpeg
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.mp3.Mp3Parser
> X-TIKA:digest:MD5: e0bdf3a0e171fca838604f9baad46612
> X-TIKA:digest:SHA256: ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0
> channels: 2
> creator: Glee Cast
> dc:creator: Glee Cast
> dc:title: Rehab (Glee Cast Version)
> meta:author: Glee Cast
> resourceName: USQX90900223_A4_T7.mp3
> *+_samplerate: 44100_+*
> title: Rehab (Glee Cast Version)
> version: MPEG 3 Layer III Version 1
> xmpDM:album: Glee: The Music, The Complete Season One
> xmpDM:artist: Glee Cast
> xmpDM:audioChannelType: Stereo
> xmpDM:audioCompressor: MP3
> *_+xmpDM:audioSampleRate: 44100+_*
> xmpDM:duration: 206301.296875
> xmpDM:genre:
> xmpDM:logComment: XXX -
> (P) 2009 Twentieth Century Fox Television - USQX90900223
> xmpDM:releaseDate:
> xmpDM:trackNumber: 4
>  
>  
> *Tika 1.19 'Structured Text' view (no sample rate):*
>  
> <?xml version="1.0" encoding="UTF-8"?><html xmlns="[http://www.w3.org/1999/xhtml]">
> <head>
> <meta name="xmpDM:genre" content=""/>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.mp3.Mp3Parser"/>
> <meta name="creator" content="Glee Cast"/>
> <meta name="xmpDM:album" content="Glee: The Music, The Complete Season One"/>
> <meta name="xmpDM:releaseDate" content=""/>
> <meta name="meta:author" content="Glee Cast"/>
> <meta name="xmpDM:artist" content="Glee Cast"/>
> <meta name="X-TIKA:digest:SHA256" content="ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0"/>
> <meta name="dc:creator" content="Glee Cast"/>
> <meta name="xmpDM:audioCompressor" content="MP3"/>
> <meta name="resourceName" content="USQX90900223_A4_T7.mp3"/>
> <meta name="xmpDM:logComment" content="XXX - &#10;(P) 2009 Twentieth Century Fox
Television - USQX90900223"/>
> <meta name="dc:title" content="Rehab (Glee Cast Version)"/>
> <meta name="Author" content="Glee Cast"/>
> <meta name="Content-Length" content="8251946"/>
> <meta name="X-TIKA:digest:MD5" content="e0bdf3a0e171fca838604f9baad46612"/>
> <meta name="Content-Type" content="audio/mpeg"/>
> <title>Rehab (Glee Cast Version)</title>
> </head>
> <body><h1>Rehab (Glee Cast Version)</h1>
> <p>Glee Cast</p>
> <p>Glee: The Music, The Complete Season One, track 4</p>
> <p>206301.3</p>
> <p>XXX -  (P) 2009 Twentieth Century Fox Television - USQX90900223</p>
> </body></html>
>  
> *_+Tika 1.19 Recursive JSON view (the sample rate is there):+_*
>  
> [
>   {
>     "Author": "Glee Cast",
>     "Content-Type": "audio/mpeg",
>     "X-Parsed-By": [
>       "org.apache.tika.parser.DefaultParser",
>       "org.apache.tika.parser.mp3.Mp3Parser"
>     ],
>     "X-TIKA:content": "Rehab (Glee Cast Version)\nGlee Cast\nGlee: The Music, The
Complete Season One, track 4\n206301.3\nXXX - \n(P) 2009 Twentieth Century Fox Television
- USQX90900223\n",
>     "X-TIKA:digest:MD5": "e0bdf3a0e171fca838604f9baad46612",
>     "X-TIKA:digest:SHA256": "ea1e4aa998f2c6e80139fa100c62fc1ee17652cf702cd484532b90183e7c5cc0",
>     "X-TIKA:parse_time_millis": "86",
>     "channels": "2",
>     "creator": "Glee Cast",
>     "dc:creator": "Glee Cast",
>     "dc:title": "Rehab (Glee Cast Version)",
>     "meta:author": "Glee Cast",
>     *+_"samplerate": "44100",_+*
>     "title": "Rehab (Glee Cast Version)",
>     "version": "MPEG 3 Layer III Version 1",
>     "xmpDM:album": "Glee: The Music, The Complete Season One",
>     "xmpDM:artist": "Glee Cast",
>     "xmpDM:audioChannelType": "Stereo",
>     "xmpDM:audioCompressor": "MP3",
>     *_+"xmpDM:audioSampleRate": "44100",+_*
>     "xmpDM:duration": "206301.296875",
>     "xmpDM:genre": "",
>     "xmpDM:logComment": "XXX - \n(P) 2009 Twentieth Century Fox Television - USQX90900223",
>     "xmpDM:releaseDate": "",
>     "xmpDM:trackNumber": "4"
>   }
> ]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message