nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Mattmann" <mattm...@apache.org>
Subject Re: Review Request 9119: Create SegmentContentDumperTool for easily extracting out file contents from SegmentDirs
Date Sat, 06 Sep 2014 04:57:07 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9119/
-----------------------------------------------------------

(Updated Sept. 6, 2014, 4:57 a.m.)


Review request for nutch and Julien Le Dem.


Bugs: NUTCH-1526
    https://issues.apache.org/jira/browse/NUTCH-1526


Repository: nutch


Description
-------

Will contain the patch the SegmentContentDumperTool described in NUTCH-1526:

./bin/nutch org.apache.nutch.tools.SegmentContentDumper [options]
   -segmentRootDir full file path to the root segment directory, e.g., crawl/segments
   -regexUrlPattern a regex URL pattern to select URL keys to dump from the content DB in
each segment
   -outputDir The output directory to write file names to.
   -metadata --key=value where key is a Content Metadata key and value is a value to check.


Diffs (updated)
-----

  ./trunk/src/java/org/apache/nutch/tools/FileDumper.java PRE-CREATION 

Diff: https://reviews.apache.org/r/9119/diff/


Testing
-------

Testing it on DARPA XDATA XNET.


Thanks,

Chris Mattmann


Mime
View raw message