nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <>
Subject Re: Static initializers
Date Tue, 20 Dec 2005 14:19:21 GMT

How do you choose the NutchConf to use ?
Here is a short discussion I had with Doug about a kind of dynamic NutchConf
inside the same JVM:

"... By looking at the mailing lists archives it seems that having some
behavior depending on the documents URL is a recurrent problem (for instance
for boosting documents matching a url pattern - NUTCH-16 issue, and many
other topics).
So, our idea is to provide a way to provide a "dynamic" nutch configuration
(that override the default one, like for the nutch-site) based on documents
matching urls pattern. The idea is as follow:

1. The default configuration is as usualy the nutch-default.xml file

2. An xml file can map some url regexp to some many others configurations
files (that override the nutch-default):
  <url regexp="*">
    <!-- A set of nutch properties that override the nutch-default for this
domain -->

What do you think about this?

Looking deeper, this is more messy that I thought... Some changes would
> be required to the plugin instantiation mechanisms, e.g.:
>     Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
>     ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
>     PluginRepository.getExtensionPoint(String) ->
> getExtensionPoint(String, NutchConf)
> etc, etc...
> The way this would work would be similar to the mechanism described
> above: if plugin instances are not created yet, they would be created
> once (based on the current NutchConf argument), and then cached in this
> NutchConf instance.
> And also the plugin implementations would have to extend
> NutchConfigured, taking NutchConf as the argument to their constructors
> - because now the Extension.getExtensionInstance would pass the current
> NutchConf instance to their contructors.

That's exactly what I had in mind while speaking about a dynamic NutchConf
with Doug.
For me it's a +1
The only think I don't really like is extending the NutchConfigured, but it
is the most secured way to implement it.




  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message