tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-851) M4V and M4A detection invalid
Date Fri, 27 Jan 2012 15:38:13 GMT

    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194854#comment-13194854

Nick Burch commented on TIKA-851:

>From http://developer.apple.com/library/mac/#documentation/QuickTime/QTFF/QTFFChap1/qtff1.html#//apple_ref/doc/uid/TP40000939-CH203-BBCGDDDF
"Generally speaking, atoms can be present in any order. Do not conclude that a particular
atom is not present until you have parsed all the atoms in the file.

An exception is the file type atom, which typically identifies the file as a QuickTime movie.
If present, this atom precedes any movie atom, movie data, preview, or free space atoms. If
you encounter one of these other atom types prior to finding a file type atom, you may assume
the file type atom is not present. (This atom is introduced in the QuickTime File Format Specification
for 2004, and is not present in QuickTime movie files created prior to 2004)."

So, if there is a ftyp atom, it should be first, and if the first atom isn't a ftyp then there
isn't one. The AtomParsely link is handy, that should help with producing a metadata extracting
> M4V and M4A detection invalid
> -----------------------------
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>         Attachments: TIKA-851.patch
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.
 When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly
returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message