nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "bin/nutch inject" by JulienNioche
Date Wed, 18 Jun 2014 11:15:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch inject" page has been changed by JulienNioche:
https://wiki.apache.org/nutch/bin/nutch%20inject?action=diff&rev1=2&rev2=3

  
  '''<url_dir>''': The directory containing our seed list (referred to above as 'flat
file'), usually a text document containing URLs, one URL per line.
  
+ The injector uses the following configurations (see https://issues.apache.org/jira/browse/NUTCH-1405)
+ 
+ * db.injector.overwrite = [true|false] : replace the entries in the crawldb with the corresponding
ones from the seed data. Will set the status to UNFETCHED.
+ 
+ * db.injector.update = [true|false] : Keeps the existing entries in the crawldb but replaces
the score and fetch interval with the values found for the corresponding entries in the seed
data. Any metadata found for the seed entry are added. The status remains what it was in the
original version of the crawldb, e.g. FETCHED.
+ 
  === Nutch 2.x ===
  {{{
  Usage: InjectorJob <url_dir> [-crawlId <id>]

Mime
View raw message