nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Cannon-Brookes <mcannonbroo...@gmail.com>
Subject Re: Nutch design queries
Date Thu, 15 Dec 2005 22:24:05 GMT
Filed as http://issues.apache.org/jira/browse/NUTCH-142

I didn't think there was much point creating a patch for a 1 line fix :)

m

On 12/16/05, Mike Cannon-Brookes <mcannonbrookes@gmail.com> wrote:
> Wow - great responses all.
>
> 0.7 vs 0.8 - apologies if I'm using an old version. I'm using the
> latest binary release. I'll switch to latest SVN HEAD and see how that
> works in my application.
>
> Is there any concrete timeline on 0.8?
>
> I'm very glad to see the statics generally being reduced. I also
> personally (!!) would remove the Nutch configuration system completely
> in favour of Spring - I believe you'd get a lot more power for very
> little investment of time - but I realise that's a much more drastic
> step for a code-newbie to suggest :)
>
> The directory listing in a J2EE application is a problem. Why do you
> need to get a directory listing? The way we load plugins in J2EE is to
> say "find me all resources named /plugin.xml", then load each of those
> XML files, then from there load the relevant classes etc as indicated.
>
> Our plugin system has a series of different 'plugin loaders' that
> handle the different strategies, which I think might work well here.
> So far we have a loader for specific files, a loader for all plugins
> in directory X, a loader which scans the file system regularly, a
> loader which uses the classpath as above etc. This puts the
> flexibility in the hands of the developer as to the restrictions their
> application will have.
>
> I'll get to work on that patch - back in a little.
>
> m
>
> PS Is anyone actively working on the wiki? It seems a little out of
> date and there's a lot more information in the mailing list. It would
> be awesome if someone would regularly trawl the mailing list for
> 'tidbits' (call them "Dougs Droppings of Wisdom"?) and wiki-ise them
> for new users. Mailing list archives are always a crap way to find
> information.
>
> On 12/16/05, Doug Cutting <cutting@nutch.org> wrote:
> > Mike Cannon-Brookes wrote:
> > > Hey guys,
> >
> > Hi, Mike!  Welcome.
> >
> > > - Classloading - I have had many problems with NutchConf due to the
> > > way it loads it's resources. In a J2EE scenario, it's simply evil :)
> > > Would there be any great problem with switching it's classloader to
> > > Thead.currentThread().getContextClassloader() instead of the current
> > > static classloader? It's a lot 'friendlier' to do it this way. I can
> > > submit a patch to do this very quickly if others are keen (or anyone
> > > can do it - I've done it locally, takes about 30 keystrokes!)
> >
> > That's not a problem.  Please submit a patch.  Attach it to a bug report
> > (if you know how to use Jira!).
> >
> > > - Statics - On that issue, there are an awful lot of static classes
> > > and methods around. This makes configuring and using Nutch in 'non
> > > standard' ways difficult as things are hard coded together (for
> > > example I can't easily swap out NutchConf to do my own configuration
> > > mechanism as it's all static accesses!). Is there any interest in
> > > removing / refactoring these statics out to make Nutch more flexible?
> >
> > Yes, that's a goal.  I'd like to seriously attack it after we merge the
> > mapred branch to trunk, probably next month.
> >
> > I made a proposal in this vein almost a year ago:
> >
> > http://www.mail-archive.com/nutch-dev@incubator.apache.org/msg00196.html
> >
> > Note also that mapred's JobConf is always used dynamically, so all of
> > the new mapred-based code can be dynamically configured.  The biggest
> > thing left to fix are plugins.  I think perhaps each plugin factory
> > method should take a configuration.
> >
> > > - Plugins / physical files - Quite a lot of stuff in Nutch seems to
> > > rely on physical files (for example plugins are loaded by looking for
> > > the "/plugins" directory on disk IIRC). In a J2EE environment, this
> > > means you can't deploy the WAR as a non-expanded WAR for example. Can
> > > we switch from loading files directly to loading resources as streams?
> > > This means you can load a file from the classloader regardless of
> > > whether or not it exists as a physical file.
> >
> > The problem is that we sometimes need to list directories, e.g., to find
> > out what resources are available.  Is there a J2EE-safe way to to do that?
> >
> > Cheers,
> >
> > Doug
> >
>
>
> --
> ATLASSIAN - http://www.atlassian.com
>


--
ATLASSIAN - http://www.atlassian.com

Mime
View raw message