nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jérôme Charron <jerome.char...@gmail.com>
Subject Re: Static initializers
Date Tue, 20 Dec 2005 14:19:21 GMT
Andrzej,

How do you choose the NutchConf to use ?
Here is a short discussion I had with Doug about a kind of dynamic NutchConf
inside the same JVM:

"... By looking at the mailing lists archives it seems that having some
behavior depending on the documents URL is a recurrent problem (for instance
for boosting documents matching a url pattern - NUTCH-16 issue, and many
other topics).
So, our idea is to provide a way to provide a "dynamic" nutch configuration
(that override the default one, like for the nutch-site) based on documents
matching urls pattern. The idea is as follow:

1. The default configuration is as usualy the nutch-default.xml file

2. An xml file can map some url regexp to some many others configurations
files (that override the nutch-default):
<nutch:conf>
  <url regexp="http://www.mydomain1.com/*">
    <!-- A set of nutch properties that override the nutch-default for this
domain -->
    <property>
        <name>property1</name>
        <value>value1</name>
    </property>
    ....
   </url>
   ....
</nutch:conf>"

What do you think about this?


Looking deeper, this is more messy that I thought... Some changes would
> be required to the plugin instantiation mechanisms, e.g.:
>
>     Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
>     ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
>     PluginRepository.getExtensionPoint(String) ->
> getExtensionPoint(String, NutchConf)
>
> etc, etc...
>
> The way this would work would be similar to the mechanism described
> above: if plugin instances are not created yet, they would be created
> once (based on the current NutchConf argument), and then cached in this
> NutchConf instance.
>
> And also the plugin implementations would have to extend
> NutchConfigured, taking NutchConf as the argument to their constructors
> - because now the Extension.getExtensionInstance would pass the current
> NutchConf instance to their contructors.

That's exactly what I had in mind while speaking about a dynamic NutchConf
with Doug.
For me it's a +1
The only think I don't really like is extending the NutchConfigured, but it
is the most secured way to implement it.

Regards

Jérôme

--
http://motrech.free.fr/
http://www.frutch.org/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message