nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Mattmann" <>
Subject Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs
Date Sat, 06 Sep 2014 04:57:07 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Sept. 6, 2014, 4:57 a.m.)

Review request for nutch and Julien Le Dem.

Bugs: NUTCH-1526

Repository: nutch


Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:

./bin/nutch [options]
   -segmentRootDir full file path to the root segment directory, e.g., crawl/segments
   -regexUrlPattern a regex URL pattern to select URL keys to dump from the content DB in
each segment
   -outputDir The output directory to write file names to.
   -metadata --key=value where key is a Content Metadata key and value is a value to check.

Diffs (updated)

  ./trunk/src/java/org/apache/nutch/tools/ PRE-CREATION 



Testing it on DARPA XDATA XNET.


Chris Mattmann

View raw message