tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting" <jukka.zitt...@gmail.com>
Subject Re: OSGI bundle for Tika
Date Tue, 20 May 2008 08:52:19 GMT

On Mon, May 19, 2008 at 3:05 PM, Yves Zoundi <yveszoundi@gmail.com> wrote:
> It would be nice to create sub-projects from Apache Tika main maven
> project. The mime detection part is pretty useful and its code could be
> in a separate project. That would allow people to use it without the
> rest of the Tika's code.

I think we can do that. Are you more worried about the size of the
tika jar or all the parser dependencies you don't need?

We might want to split Tika into two parts, say tika-core and
tika-parsers, where tika-core would contain all the core interfaces
and classes with no dependencies to external libraries (except of
course the standard Java 5 class libraries). We could go even further
by partitioning the core library by function, but I'm not sure if that
is worth the extra complexity.

> I removed few classes from the source code and created a jar with the
> mime detection code. I needed to use Tika in an OSGI environment and it
> was a bit painful to use Tika out of the box(without embedding it in an
> OSGI bundle which would export Tika packages later).
> I had to create a manifest and as Tika's code is not huge, I was able to
> export the packages quickly. I need to import javax.xml.parsers, sax and
> dom packages as Tika use them to load the mimetypes configuration file.

It should be possible to add the OSGi bundle information automatically
in the normal Maven build. You might want to file an improvement
request for this.

> The thing I didn't see in the mime detection code was a serializer to
> save the mimetypes.

Our use cases so far have had only manual modifications of the
configuration files, but I don't see why we couldn't make it possible
to programmatically modify the configuration. In fact I've already
done some work towards making the media type registry easier to
manage, and a serializer for the configuration file would be a nice
addition. Could you file a feature request for that?

> In a typical application, people usually :
> - Want a mime type configuration file somewhere that they can load
> - Want to be able to add/remove mimetypes
> - Add file extensions patterns to existing mime types
> - Store back the mime types to its location.
> So my questions are :
> - If I load the mimetypes from a file, and add some mimetype entries at
> runtime, how can I save back the file without doing it manually with
> dom, jdom or dom4j?

Currently the only way is to modify the XML file directly, but as
mentioned above a higher level serialization feature would be nice.

> - Would it be possible to create an OSGI bundle for the mime detection
> library?



Jukka Zitting

View raw message