nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Mattmann" <chris.mattm...@jpl.nasa.gov>
Subject RE: tools cleanup
Date Thu, 31 Mar 2005 02:57:16 GMT
Hi Doug,

> 1. An "action" is an operation on Nutch data.  For example,
> GenerateSegmentFromDB, FetchSegment, UpdateDB, IndexSegment,
> MergeIndexes, SearchServer, etc. are all actions.
> 
> 2. A "tool" invokes an action from the command line.
> 
> The proposal:
> 
> 1. Actions and tools should be separate classes, in separate files.
> 
> 2. A tool class should define no methods other than a main() and perhaps
> those required to parse the command line.  All application logic should
> be in the action class.

I agree with the general distinction here between "tools" and "actions". I
think that it makes sense to separate the action specific code, such as
"update the DB", from typical command line code, e.g., parsing command line
parameters, and setting up the action objects and invoking them. One way
that we've addressed this in the project that I work on at JPL is adding
main() methods to the "action" classes that you define. So, for instance,
you would have:

class Action implements IAction{

     public void method1(){}
      
     public String getSomeAttribute(){}

     //more methods

     public static void main(String [] args){
       //put in your Tool code here
     }
}

So, it's just another way of thinking about it actually. 

> 
> 3. All actions must implement the following interface:
> 
>    public interface NutchConfigurable {
>      void setConf(NutchConf conf);
>      NutchConf getConf();
>    }

Is this so that actions can have different configuration files? I'm not *
that * familiar with the NutchConf class. Does it read a set of default
configuration files? If that is the case, why would you need to pass a
NutchConf to the action?


> 
> 4. Most actions should implement this by extending:
> 
>    public class NutchConfigured implements NutchConfigurable {
>      private NutchConf conf;
>      public NutchConfigured(NutchConf conf) { setConf(conf); }
>      public void setConf(NutchConf conf) { this.conf = conf; }
>      public NutchConf getConf() { return conf; }
>    }
> 
> 5. All plugins must implement NutchConfigurable.
> 
> 6. Plugin factory methods must accept a NutchConf.
> 
> For example:
> 
>    public static Protocol ProtocolFactory.getProtocol(String url);
> 
> will become:
> 
>    public static Protocol ProtocolFactory.getProtocol(NutchConf, String);
> 

I think that this totally makes sense if the NutchConf can have different
resource configuration files for different Actions...

> Comments?
> 
> Doug

Nice Proposal. I think it makes a lot of sense, and should help with some
changes that need to be made I think to the Nutch XML reading code within
the PluginRepository functionality within Nutch (i.e. refactoring the code
to only use one XML API).....


Cheers,
  Chris



Mime
View raw message