tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-851) M4V and M4A detection invalid
Date Fri, 27 Jan 2012 14:54:11 GMT

    [ https://issues.apache.org/jira/browse/TIKA-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194813#comment-13194813
] 

Nick Burch commented on TIKA-851:
---------------------------------

It looks like most files (not sure if it's all of them though) have a ftyp atom at byte 4.
This has "ftyp" followed by a 4 byte (space padded if needed) string of the main type. There's
a list of the common ones at http://www.ftyps.com/

I've added more specific matches for the common types in r1236700. Using the tika-app jar,
I can now correctly detect mp4 video, Apple m4v video, mp4 audio and old quicktime movs (using
the lower priority fallback)

I'm not sure if the ftyp atom has to be first or not, if it isn't then this detection won't
work. Longer term, a proper file format aware detector would be best, ideally one that can
also understand the rest of the format to report on different streams etc
                
> M4V and M4A detection invalid
> -----------------------------
>
>                 Key: TIKA-851
>                 URL: https://issues.apache.org/jira/browse/TIKA-851
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 1.0
>            Reporter: Alexander Chow
>             Fix For: 1.1
>
>
> When the mime type of an M4V file is detected using its name only, it returns video/x-m4v.
 When it is detected using the InputStream (hence utilising the MagicDetector), it incorrectly
returns video/quicktime.
> Using the sample M4V file from Apple's [knowledge base|http://support.apple.com/kb/HT1425]:
> {code:title=TikaTest.java}
> public class TikaTest {
> 	public static void main(String[] args) throws Exception {
> 		String userHome = System.getProperty("user.home");
> 		File file = new File(userHome + "/Desktop/sample_iPod.m4v");
> 		InputStream is = TikaInputStream.get(file);
> 		Detector detector = new DefaultDetector(
> 			MimeTypes.getDefaultMimeTypes());
> 		Metadata metadata = new Metadata();
> 		metadata.set(Metadata.RESOURCE_NAME_KEY, file.getName());
> 		System.out.println("File + filename: " + detector.detect(is, metadata));
> 		System.out.println("File only:       " + detector.detect(is, new Metadata()));
> 		System.out.println("Filename only:   " + detector.detect(null, metadata));
> 	}
> }
> {code}
> Renders the output:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   video/x-m4v
> {code}
> Moreover, if the same test is run against an M4A file, the results are even more incorrect:
> {code}
> File + filename: video/quicktime
> File only:       video/quicktime
> Filename only:   application/octet-stream
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message