nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: Static initializers
Date Tue, 20 Dec 2005 14:34:02 GMT
Jérôme Charron wrote:

>Andrzej,
>
>How do you choose the NutchConf to use ?
>  
>

It is provided as an argument to all constructors.

>Here is a short discussion I had with Doug about a kind of dynamic NutchConf
>inside the same JVM:
>
>"... By looking at the mailing lists archives it seems that having some
>behavior depending on the documents URL is a recurrent problem (for instance
>for boosting documents matching a url pattern - NUTCH-16 issue, and many
>other topics).
>So, our idea is to provide a way to provide a "dynamic" nutch configuration
>(that override the default one, like for the nutch-site) based on documents
>matching urls pattern. The idea is as follow:
>  
>

Well, it's a neat idea, but it's not necessarily what I was proposing. 
My proposal could be the first step to implement this.

>1. The default configuration is as usualy the nutch-default.xml file
>
>2. An xml file can map some url regexp to some many others configurations
>files (that override the nutch-default):
><nutch:conf>
>  <url regexp="http://www.mydomain1.com/*">
>    <!-- A set of nutch properties that override the nutch-default for this
>domain -->
>    <property>
>        <name>property1</name>
>        <value>value1</name>
>    </property>
>    ....
>   </url>
>   ....
></nutch:conf>"
>
>What do you think about this?
>  
>

Yes, if you can specify different configs for every run, or even for 
every invocation, it's certainly possible.

>
>Looking deeper, this is more messy that I thought... Some changes would
>  
>
>>be required to the plugin instantiation mechanisms, e.g.:
>>
>>    Extension.getExtensionInstance() -> getExtensionInstance(NutchConf)
>>    ExtensionPoint.getExtensions() -> getExtensions(NutchConf)
>>    PluginRepository.getExtensionPoint(String) ->
>>getExtensionPoint(String, NutchConf)
>>
>>etc, etc...
>>
>>The way this would work would be similar to the mechanism described
>>above: if plugin instances are not created yet, they would be created
>>once (based on the current NutchConf argument), and then cached in this
>>NutchConf instance.
>>
>>And also the plugin implementations would have to extend
>>NutchConfigured, taking NutchConf as the argument to their constructors
>>- because now the Extension.getExtensionInstance would pass the current
>>NutchConf instance to their contructors.
>>    
>>
>
>That's exactly what I had in mind while speaking about a dynamic NutchConf
>with Doug.
>For me it's a +1
>The only think I don't really like is extending the NutchConfigured, but it
>is the most secured way to implement it.
>  
>

Well, it's a form of enforcing a contract for the constructors. There is 
no other way to do it in Java - you can't specify the required 
constructors in an interface. OTOH you have the NutchConfigurable 
interface, which we could use instead, but then you have to remember to 
call setConf() before you do anything else...

I'll work on this to see where it leads.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Mime
View raw message