tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Luca Moretti (JIRA)" <j...@apache.org>
Subject [jira] [Created] (TIKA-1823) Support detecting DWF format
Date Tue, 05 Jan 2016 23:32:39 GMT
Luca Moretti created TIKA-1823:

             Summary: Support detecting DWF format
                 Key: TIKA-1823
                 URL: https://issues.apache.org/jira/browse/TIKA-1823
             Project: Tika
          Issue Type: Improvement
          Components: detector, mime
    Affects Versions: 1.11
            Reporter: Luca Moretti
            Priority: Minor

Tika currently detects dwf files as application/octect-stream.
To make Tika mime magic detector correctly recognize dwf files it should be added this code
fragment in _tika-mimetypes.xml_ registry:

<mime-type type="model/vnd.dwf">
	<_comment>Design Web Format</_comment>
	<magic priority="50">
		<match type="string" offset="0" value="(DWF V">
			<match type="string" offset="8" value=".">
				<match type="string" offset="11" value=")" />
	<glob pattern="*.dwf" />
In current version (DWF 6.0), dwf file is a ZIP-compressed container for vector-based CAD
drawings. It is basically a ZIP archive with the _(DWF V06.00)_ signature added before the
regular ZIP magic number. For this reason, the match value to detect dwf files should be:
{{(DWF V06.00)PK}}.
In the previous versions, the dwf data transport isn't a ZIP file format, so the magic number
is only the _(DWF V00.55)_ signature in the file header.
To make Tika detect dwf files with this version too I propose the match value in the code



P.S.: The DWF format specification is included in the DWF Toolkit. The DWF Toolkit is available
for free at [http://www.autodesk.com/dwftoolkit]

This message was sent by Atlassian JIRA

View raw message