nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: [Nutch-dev] I made parse-rss work, but ... Re: Huge Problem trying to develop plugin for Nutch
Date Mon, 28 Mar 2005 17:16:30 GMT
Chris, John,

Since I'm not able to found the sources, I simple write a own RSS 
parser plugin and contributed it.
Find it here;
http://issues.apache.org/jira/browse/NUTCH-30
I hope it is ok to fix your problem this way. :-)

If you like it please vote for the issue.


Some comments about xml and java in general:
XML support in java is a pain.
Especially in container apps like a plugin system that has a class 
loading model.
You will find million postings about problems eg. using xml in webapps 
within jboss or tomcat..

Anyway it is every-time related to class loading may incompatible 
versions of the libs are in the normal jdk lib or in the endorsed 
folder or inside any other jar that is in the class path.

From:
http://wiki.media-style.com/pages/viewpage.action?pageId=1154

The class-loader of a plugin gets all jar libraries assigned until 
initialization that are defined in the manifest file. Beside these 
'local' libraries, the dependency chain of a plugin is analyzed, and 
all jar libraries defined as public are assigned to the class-loader as 
well.
  When now at runtime a class tries to load a other class, first we try 
to load the class from the plugin's class-loader. In case loading a 
class from the plugin's class-loader fails, we forward the class load 
request to the parent of the plugin class-loader.

This is the class-loader of the nutch tool that had started the plugin 
system.
So back-end it is the runtime class-loader of java in case of the user 
interface it is the class-loader of the tomcat webapp.


Hope that helps, at least we have a working rss parser now. :-)

Stefan


Am 27.03.2005 um 11:12 schrieb John X:

> Chris,
>
> I made plugin parse-rss work by
>
> (1) installing jdom.jar under $nutch_top/lib,
> instead of $nutch_top/src/plugin/parse-rss/lib
> (2) using jaxen-{core,jdom}.jar,instead of jaxen-full.jar.
> Related, there are some hacks necessary in commons-feedparser,
> mostly reflecting api changes for XPath.
>
> (1) above is puzzling. I got the same error as you did,
> if jdom.jar is placed under the plugin's own lib dir.
> I am not sure it is caused by possible bug in nutch plugin core,
> or namespace conflicting in some jars, or something else.
>
> Stefan (Groschupf): could you please enlighten us on possible causes?
>
> One note: there is a tool called net.nutch.parse.ParserChecker, that
> you can use to debug parser plugins. It is more convenient
> to use it than start a crawler.
>
> Will you be able to contribute this plugin after the dust settles?
>
> Best,
>
> John
>
> On Sat, Mar 26, 2005 at 01:32:34PM -0800, CHRIS A MATTMANN wrote:
>> Hi John,
>>
>>   I posted it earlier as a .txt file, but since it's small I could 
>> just include it in this email:
>>
>>
>> import java.net.URL;
>> import java.net.URLClassLoader;
>>
>>
>
>
> -------------------------------------------------------
> SF email is sponsored by - The IT Product Guide
> Read honest & candid reviews on hundreds of IT Products from real 
> users.
> Discover which products truly live up to the hype. Start reading now.
> http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click
> _______________________________________________
> Nutch-developers mailing list
> Nutch-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
>
>
---------------------------------------------------------------
company:		http://www.media-style.com
forum:		http://www.text-mining.org
blog:			http://www.find23.net


Mime
View raw message