tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (Jira)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-3023) Text files starting with MOVI are detected as X-SGI-Movie
Date Wed, 08 Jan 2020 19:47:00 GMT

    [ https://issues.apache.org/jira/browse/TIKA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17010968#comment-17010968

Tim Allison commented on TIKA-3023:

I regret that I know nothing about this format. 

TRID requires only "MOVI": [http://file-extension.net/seeker/file_extension_SGI]

I couldn't quickly find a signature in Pronom.

{{file}} requires only "MOVI"


>From the files linked here: [http://fileformats.archiveteam.org/wiki/SGI_movie], it looks
like they all start with "MOVI\u0000".  


Any objections to adding a 0x00 after MOVI?

> Text files starting with MOVI are detected as X-SGI-Movie
> ---------------------------------------------------------
>                 Key: TIKA-3023
>                 URL: https://issues.apache.org/jira/browse/TIKA-3023
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.23
>         Environment: Issue recreated on
> Windows 10 Professional 64bit running the runnable Jar
> Ubuntu 16.04.6 LTS running Tika-Python
>            Reporter: Steve
>            Priority: Minor
>         Attachments: capitalmovie.txt
> If a plaintext file starts with "MOVI" Tika labels it as an SGI Movie.
> The hex conversion for MOVI is 4D 4F 56 49 which is the same as the header for the SGI
Movie file format
> [https://reposcope.com/mimetype/video/x-sgi-movie]
> This SGI format isn't supported so any information from a text file starting like this
would be lost. I've attached a simple file that should recreate the problem.

This message was sent by Atlassian Jira

View raw message