nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Nutch dev. plans
Date Wed, 29 Jul 2009 15:23:18 GMT
Kirby Bohling wrote:
> 2009/7/29 Doğacan Güney <dogacan@gmail.com>:
>> Hey guys,
>>
>> Kirby, thanks for all the insightful information! I couldn't comment
>> much as most of
>> the stuff went right over my head :) (don't know much about OSGI).
>>
> 
> A bit of a progress report to make sure I'm heading in the proper direction.
> 
> I'm learning I know a lot about Eclipse RCP, which hid lots of details
> about OSGi from me.  No public code yet, I'm hoping that happens this
> weekend.
> 
> Got Felix downloaded and started an embedded OSGi environment
> successfully.  I chose Felix because it's Apache licensed.  I'm not
> clear if the CPL/EPL is acceptable to the ASF for inclusion and
> distribution.

I need to check this - it probably is ok to include it and distribute 
it, with a notice.

>  Sounds like Equinox is more full featured.  I'll
> probably integrate with both just for sanity checking portability.

http://s3.amazonaws.com/neilbartlett.name/osgibook_preview_20090110.pdf

This book indicates that Equinox (and Eclipse) historically preferred 
extensions over services, so the toolchain available in Equinox is more 
robust for building extensions. Unless I got something mixed up ;)

My understanding is that in Nutch we want primarily services, not 
extensions. Although I'm not sure about the pre-/post-processing plugins 
such as query filters and indexing filters, as well as library-only 
plugins like lib-xml.

> My quick research indicates that Hadoop isn't OSGi application
> friendly, but can host an embedded OSGi environment.  I bogged down
> attempting to integrate the bnd tool to run inside of Ant.  I think I
> have that resolved so I can just wrap third party jar's in the ant
> scripts.  So hopefully I can make more meaningful progress soon.
> 
> The current mental architecture I have is to make all the libraries in
> ./lib/*.jar end up in the top level classloader outside the OSGi
> environment (I forget the technical OSGi name, I think it is the
> System Classloader).

+1.

>  Then, turn the Nutch Core into a single bundle.

+1.

> Turn each current plugin into an OSGi bundle.  Each plugin registers a
> "service" which is a factory capable of creating whatever is currently
> inside of the plugin.xml as an "implementation" attribute.  Modify the
> core to use the factories inside of the extension point code.  I think
> that is the minimally invasive way to get to OSGi.

I'm not sure about this part. There may be many plugins that implement 
the same service. E.g. both protocol-http and protocol-httpclient 
implement HTTP protocol service. Some plugins implement many services 
(e.g. creativecommons implements both parsing, indexing and query 
components).

> 
> I assumed that it would be easiest to get OSGi in and integrated with
> minimal disruption.  If nothing else, we can use that as a staging
> point to play with more invasive designs and architectures.  In the
> long run, I think something more invasive makes more sense.  Hopefully
> that is effective risk management rather then a waste of time.
> 
> Please course correct me if any of this seems a bad idea.

It would be fantastic to have something like this as a starting point. 
Other developers can join this effort as soon as they understand the 
general design, and this prototype should provide a good example and 
verify the design.

-- 
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Mime
View raw message