nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyeris Rodriguez Rueda <>
Subject make responseTime native in nutch
Date Mon, 06 Feb 2017 14:54:14 GMT
Hi all.
Nutch has a configuration that permit save responseTime for every url that
is fetched, and this value is stored in crawl Datum under the key _rs_ but
not indexed.
Will be very usefull to index this value also.
This value is very important in all cases and it is very easy to make this
native in nutch.
A little change to index basic plugin (or other) can make this happend.

//index responseTime for each url if is true
    boolean property= conf.getBoolean("",true);
    if (property == true){
      String value=datum.getMetaData().get(new Text("_rs_")).toString();

I can do the jira ticket ant patch for this.
What you think about it ?

View raw message