tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Grant <tgr...@sms-fed.com>
Subject Appending Mime Types
Date Thu, 18 Aug 2011 22:04:53 GMT
Is there a way to programmatically register new Mime Types?  We have a way
to plug-in new parsers, but I do not see a way to define new file types.
I'd like to be able to contribute both the Mime Type definitions as well as
the Parser implementations that parse them in a single plugin Jar file.  The
code to update Mime Types exists in org.apache.tika.mime.MimeTypesReader but
that class is package scope.  I would like it to be public, or provide
another class like the one attached that exposes its functionality.  The key
is that I want to keep the standard Mime Types and just append or override a
few of my own.  I currently append to the Mime Types using:

MimeTypes types = _tikaConfig.getMimeRepository();
MimeTypesAppender appender = new MimeTypesAppender(types);

I realize that I can copy the tika-mimetypes.xml file and add my own types,
but it requires that I maintain one master file, and that I update it every
time someone on my team adds or removes a new parser. I then run the risk of
getting out of sync with the one distributed with Tika. I think a better
approach might be to add another META-INF/ file that contains the extra mime
types that should be loaded by Tika.
org.apache.tika.config.ServiceLoader.findServiceResources hints at this
approach but it doesn't appear to be in place.  MimeTypes
getDefaultMimeTypes() just loads a single file.


package org.apache.tika.mime;

import java.io.IOException;
import java.io.InputStream;
import org.w3c.dom.Document;

 * Works around the fact that the MimeTypesReader class is package scope.
public class MimeTypesAppender {

    private final MimeTypesReader _reader;

    public MimeTypesAppender(MimeTypes types) {
        this._reader = new MimeTypesReader(types);

    public void append(Document doc) throws MimeTypeException {

    public void append(InputStream is) throws MimeTypeException, IOException


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message