tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Burch (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TIKA-1120) Enable direct use of org.apache.tika.mime.MediaType.detect(...)
Date Wed, 12 Jun 2013 16:02:20 GMT

    [ https://issues.apache.org/jira/browse/TIKA-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13681353#comment-13681353
] 

Nick Burch commented on TIKA-1120:
----------------------------------

The latest detection documentation is at <https://tika.apache.org/1.3/detection.html>
- the URL you referenced is for an older version of Tika

I don't think people probably should be doing the things in your code... You should really
be going to a TikaConfig object <http://tika.apache.org/1.3/api/org/apache/tika/config/TikaConfig.html>,
and either getting a Detector from that, or the mime types registry. 

Are you able to suggest some tweaks to the most recent documentation that would make this
clearer for someone in your situation?
                
> Enable direct use of org.apache.tika.mime.MediaType.detect(...)
> ---------------------------------------------------------------
>
>                 Key: TIKA-1120
>                 URL: https://issues.apache.org/jira/browse/TIKA-1120
>             Project: Tika
>          Issue Type: Wish
>          Components: mime
>    Affects Versions: 1.3
>            Reporter: Oliver Kopp
>            Priority: Minor
>
> When using mime type detection, the classes allow following use:
>     try (InputStream is = theInputStream;
>          BufferedInputStream bis = new BufferedInputStream(is);) {
>         MimeTypes mt = new MimeTypes();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = mt.detect(bis, null);
>         return mediaType.toString();
>     }
> When debugging this, the MimeTypes class instantiates its internal patterns with  an
empty MediaTypeRegistry. Therefore, getDefaultMimeTypes() is never called and thus tika-mimetypes.xml
never read.
> Is it possible to enable direct usage of MediaType.detect()? Like adding a new constructor,
where the MediaTypeRegistry can be set? 
> If not, the code comments (or the documentation at https://tika.apache.org/0.10/detection.html)
should point out that MimeTypes() should not instantiated directly for mime type detection,
but the detectors should be used. Possibly, a minimum example should be added to make the
usage clear.
> Following example works here
>     try (InputStream is = theInputStream;
>             BufferedInputStream bis = new BufferedInputStream(is);) {
>         AutoDetectParser parser = new AutoDetectParser();
>         Detector detector = parser.getDetector();
>         Metadata md = new Metadata();
>         md.add(Metadata.RESOURCE_NAME_KEY, theFileName);
>         MediaType mediaType = detector.detect(bis, md);
>         return mediaType.toString();
>     }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message