tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom Grant <tgr...@sms-fed.com>
Subject Re: Appending Mime Types
Date Mon, 22 Aug 2011 18:37:02 GMT
Here's the use case that I'm attempting to solve.  I have a customer with
many legacy systems, some of which are completely custom.  These systems
have data files that will never be seen outside of their environment.  For
example, some are XML files with their own schemas.  Some are similar to the
new office documents and are zip files containing xml and other goodies.
Others are serialized-objects dumped to disk.  Some are similar to EDI with
a header and data body with prescribed offsets. The choices of the past
can't be undone and I'm stuck with about 30 or 40 different file types.  I
want to use Tika as the standard API to exploit those old formats.  The
customer's developers know the internals of the formats, I just need to give
them an API to map them to instead of developing stovepipes to load each
format.  The quantity of file types means that its going to take a few
months to complete and will happen a few at a time.  So I'd like to
co-locate the mimetype definition with the parser code for maintainability.

How about adding update methods to org.apache.tika.mime.MimeTypesFactory?
This class is public and its currently the only way to populate a MimeTypes
object.

/**
     * Updates the provided MimeTypes instance with types read from the
specified location.
     * @throws IOException if the stream can not be read
     * @throws MimeTypeException if the type configuration is invalid
     */
    public static void update(MimeTypes mimeTypes, URL url)
            throws IOException, MimeTypeException {
        InputStream stream = url.openStream();
        try {
            update(mimeTypes, stream);
        } finally {
            stream.close();
        }
    }

    /**
     * Updates the provided MimeTypes instance with types read from the
specified input stream.
     * Does not close the input stream.
     * @throws IOException if the stream can not be read
     * @throws MimeTypeException if the type configuration is invalid
     */
    public static void update(MimeTypes mimeTypes, InputStream inputStream)
            throws IOException, MimeTypeException {
        new MimeTypesReader(mimeTypes).read(inputStream);
    }

    /**
     * Updates the provided MimeTypes instance with types read from the
specified document.
     * Does not close the input stream.
     * @throws MimeTypeException if the type configuration is invalid
     */
    public static void update(MimeTypes mimeTypes, Document document)
            throws MimeTypeException {
        new MimeTypesReader(mimeTypes).read(document);
    }




On Mon, Aug 22, 2011 at 1:00 PM, Nick Burch <nick.burch@alfresco.com> wrote:

> On Thu, 18 Aug 2011, Tom Grant wrote:
>
>> Is there a way to programmatically register new Mime Types?
>>
>
> I think the expectation was that people finding gaps would open a new jira
> entry, and list the details of these mimetypes and then everyone would
> benefit from them!
>
> There shouldn't be many cases where you need add a few custom mimetypes,
> normally it's just extra ones we don't list.
>
> Nick
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message