nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kirby Bohling <>
Subject Re: Nutch dev. plans
Date Fri, 17 Jul 2009 22:49:16 GMT
On Fri, Jul 17, 2009 at 5:21 PM, Andrzej Bialecki<> wrote:
> Doğacan Güney wrote:
>>> There's no specific design yet except I can't stand the existing plugin
>>> framework anymore ... ;) I started reading on OSGI and it seems that it
>>> supports the functionality that we need, and much more - it certainly
>>> looks
>>> like a better alternative than maintaining our plugin system beyond 1.x
>>> ...
>> Couldn't agree more with the "can't stand plugin framework" :D
>> Any good links on OSGI stuff?
> I found this:

Plugins are called Bundles in OSGi parlance, but I'll use plugin as
that's the term used by Nutch.

I have done quite a bit of OSGi work (I used to develop RCP
applications for a living).  OSGi is great, as long as you plan on not
using reflection to retrieve classes directly, and you don't plan on
using a library that uses it directly.

Pretty much every use of usage like this:

Class<?> clazz = Class.forName(stringFromConfig);
// Code to create an object using this class...

Will fail, unless the code is very classloader aware.  So if you're
going to switch over to using OSGi (which I think would be wonderful),
you'll want to ensure that you can deal with all of the third-party
libraries.  I haven't played much with any of the Declarative Services
stuff (I think that was slated for OSGi, but it might have just been
an Eclipse extension).

We managed to get most of the code to play nice, and had a few
horrific hacks for allowing the use of Spring if necessary.

The OSGi uses classloader segmentation to allow multiple conflicting
versions of the same code inside the same project.  So having a
pattern like:

Plugin A: nutch.api (Which contains say the interface Parser { })
Plugin B: parser.word (which has class WordParser implements Parser)

Plugin B has to depend on Plugin A so it can see the parser.  In this
case, Plugin A can't have code that uses Class.forName("WordParser");

OSGi changes the default classloader delegation, you can only see
classes in plugins you depend upon, and cycles in the dependencies are
not allowed.

If you want to do that, you end up having to do:

ClassLoader loader = ParserRegistery.lookupPlugin("WordParser");
Class.forname("WordParser", loader);

OSGi has some SPI-like way way to have a plugin note the fact that it
contributes an implementation of the Parser interface.  Eclipse builds
on top of it, and that's what Eclipse 3.x implemented the
Extension/ExtensionPoint system on top of.  I believe they are called
services in "raw" OSGi.

It's not a huge deal to write that yourself for API's you implement.
The problem is that it can be difficult to integrate really useful
third-party libraries that don't account for this change in
classloader behaviour.  At points it can make it very problematic to
use a specific XML parser that has the features you want (or some
library you want to use really wants).  Because they do this sort of
thing all the time.

I'm guessing that Tika isn't ready for this.  Given that it's an
Apache and/or Lucene project, it can probably be addressed.  My guess
is that a number of the libraries they depend upon won't be.

You can use fragments to get away from that (a fragment requires a
host bundle, the fragment's classes are loaded using the same
classloader as the host), but it doing that defeats a lot of the
reason for using OSGi (at least in terms of allowing you to use
multiple conflicting libraries in the same application).


> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>  Contact: info at sigram dot com

View raw message