nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2454) REST API fix for usage of hostdb in generator
Date Wed, 03 Jan 2018 17:31:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309950#comment-16309950
] 

ASF GitHub Bot commented on NUTCH-2454:
---------------------------------------

lewismc closed pull request #248: fix for NUTCH-2454 REST API fix for usage of hostdb in generator
URL: https://github.com/apache/nutch/pull/248
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/java/org/apache/nutch/crawl/Generator.java b/src/java/org/apache/nutch/crawl/Generator.java
index e5f4831d0..6af2ae671 100644
--- a/src/java/org/apache/nutch/crawl/Generator.java
+++ b/src/java/org/apache/nutch/crawl/Generator.java
@@ -929,8 +929,9 @@ public int run(String[] args) throws Exception {
     boolean force = false;
     int maxNumSegments = 1;
     String expr = null;
-
+    String hostdb = null;
     Path crawlDb;
+    
     if(args.containsKey(Nutch.ARG_CRAWLDB)) {
       Object crawldbPath = args.get(Nutch.ARG_CRAWLDB);
       if(crawldbPath instanceof Path) {
@@ -957,6 +958,9 @@ public int run(String[] args) throws Exception {
     else {
       segmentsDir = new Path(crawlId+"/segments");
     }
+    if (args.containsKey(Nutch.ARG_HOSTDB)) {
+      	hostdb = (String)args.get(Nutch.ARG_HOSTDB);
+    }
     
     if (args.containsKey("expr")) {
       expr = (String)args.get("expr");
@@ -986,7 +990,7 @@ public int run(String[] args) throws Exception {
 
     try {
       Path[] segs = generate(crawlDb, segmentsDir, numFetchers, topN, curTime,
-          filter, norm, force, maxNumSegments, expr);
+          filter, norm, force, maxNumSegments, expr, hostdb);
       if (segs == null){
         results.put(Nutch.VAL_RESULT, Integer.toString(1));
         return results;
diff --git a/src/java/org/apache/nutch/metadata/Nutch.java b/src/java/org/apache/nutch/metadata/Nutch.java
index 7ad0b5edb..8d485e5c8 100644
--- a/src/java/org/apache/nutch/metadata/Nutch.java
+++ b/src/java/org/apache/nutch/metadata/Nutch.java
@@ -97,6 +97,9 @@
 	public static final String ARG_SEGMENTDIR = "segment_dir";
 	/** Argument key to specify the location of individual segment for the REST endpoints **/
 	public static final String ARG_SEGMENT = "segment";
+	/** Argument key to specify the location of hostdb for the REST endpoints **/
+	public static final String ARG_HOSTDB = "hostdb";
+
 	
 	/** Title key in the Pub/Sub event metadata for the title of the parsed page*/
 	public static final String FETCH_EVENT_TITLE = "title";


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> REST API fix for usage of hostdb in generator
> ---------------------------------------------
>
>                 Key: NUTCH-2454
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2454
>             Project: Nutch
>          Issue Type: Bug
>          Components: generator
>    Affects Versions: 1.12
>            Reporter: Semyon Semyonov
>             Fix For: 1.15
>
>         Attachments: NUTCH-2368_RESTAPI_Fix.patch
>
>
> NutchNUTCH-2368
> Variable generate.max.count and fetcher.server.delay



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message