nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "bin/nutch_invertlinks" by LewisJohnMcgibbney
Date Sat, 02 Jul 2011 15:17:57 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch_invertlinks" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_invertlinks

Comment:
Update to reflect Nutch 1.3 API

New page:
Invertlinks is an alias for org.apache.nutch.crawl.LinkDb

This class maintains an inverted link map, listing incoming links for each url. Public class
LinkDb extends Configured implements Tool, Mapper<Text, ParseData, Text, Inlinks>

Usage:

{{{
bin/nutch invertlinks <linkdb> (-dir <segmentsDir> | <seg1> <seg2>
...) [-force] [-noNormalize] [-noFilter]
}}}

'''<linkdb>''': This should be the path the the output linkdb to create or update.

'''-dir <segmentsDir>''': This corresponds to the parent directory containing several
segments, OR

'''-dir <seg1 <seg2> ...''': A list of segment directories to create a inverted linkdb
from.

'''[-force]: This arguement forces an update even if linkdb appears to be locked /!\ :(CAUTION
advised: /!\

'''[-noNormalize]''': We pass this if we don't normalize link URLs. This obtains us a true
representation of incoming links within the linkdb.

'''[-noFilter]''': This parameter avoids and doesn't apply any of our current URLFilters to
link URLs.


CommandLineOptions

Mime
View raw message