nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Nutch dev. plans
Date Wed, 29 Jul 2009 15:23:18 GMT
Kirby Bohling wrote:
> 2009/7/29 Doğacan Güney <>:
>> Hey guys,
>> Kirby, thanks for all the insightful information! I couldn't comment
>> much as most of
>> the stuff went right over my head :) (don't know much about OSGI).
> A bit of a progress report to make sure I'm heading in the proper direction.
> I'm learning I know a lot about Eclipse RCP, which hid lots of details
> about OSGi from me.  No public code yet, I'm hoping that happens this
> weekend.
> Got Felix downloaded and started an embedded OSGi environment
> successfully.  I chose Felix because it's Apache licensed.  I'm not
> clear if the CPL/EPL is acceptable to the ASF for inclusion and
> distribution.

I need to check this - it probably is ok to include it and distribute 
it, with a notice.

>  Sounds like Equinox is more full featured.  I'll
> probably integrate with both just for sanity checking portability.

This book indicates that Equinox (and Eclipse) historically preferred 
extensions over services, so the toolchain available in Equinox is more 
robust for building extensions. Unless I got something mixed up ;)

My understanding is that in Nutch we want primarily services, not 
extensions. Although I'm not sure about the pre-/post-processing plugins 
such as query filters and indexing filters, as well as library-only 
plugins like lib-xml.

> My quick research indicates that Hadoop isn't OSGi application
> friendly, but can host an embedded OSGi environment.  I bogged down
> attempting to integrate the bnd tool to run inside of Ant.  I think I
> have that resolved so I can just wrap third party jar's in the ant
> scripts.  So hopefully I can make more meaningful progress soon.
> The current mental architecture I have is to make all the libraries in
> ./lib/*.jar end up in the top level classloader outside the OSGi
> environment (I forget the technical OSGi name, I think it is the
> System Classloader).


>  Then, turn the Nutch Core into a single bundle.


> Turn each current plugin into an OSGi bundle.  Each plugin registers a
> "service" which is a factory capable of creating whatever is currently
> inside of the plugin.xml as an "implementation" attribute.  Modify the
> core to use the factories inside of the extension point code.  I think
> that is the minimally invasive way to get to OSGi.

I'm not sure about this part. There may be many plugins that implement 
the same service. E.g. both protocol-http and protocol-httpclient 
implement HTTP protocol service. Some plugins implement many services 
(e.g. creativecommons implements both parsing, indexing and query 

> I assumed that it would be easiest to get OSGi in and integrated with
> minimal disruption.  If nothing else, we can use that as a staging
> point to play with more invasive designs and architectures.  In the
> long run, I think something more invasive makes more sense.  Hopefully
> that is effective risk management rather then a waste of time.
> Please course correct me if any of this seems a bad idea.

It would be fantastic to have something like this as a starting point. 
Other developers can join this effort as soon as they understand the 
general design, and this prototype should provide a good example and 
verify the design.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

View raw message