nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "Nutch_1.X_RESTAPI/RunningJobsTutorial" by SujenShah
Date Mon, 08 Jun 2015 17:30:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "Nutch_1.X_RESTAPI/RunningJobsTutorial" page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI/RunningJobsTutorial?action=diff&rev1=4&rev2=5

  2. :~$ bin/nutch startserver -port <port_number> -host <host_name> [If the host/port
option is not specified then by default the server starts on localhost:8081]
  
  == Jobs ==
- Currently the service supports the running of the following jobs - Inject, Generate, Fetch,
Parse, Updatedb, Invertlinks, Dedup and Readdb.
+ Currently the service supports the running of the following jobs - Inject, Generate, Fetch,
Parse, Index, Updatedb, Invertlinks, Dedup and Readdb.
  Any new job can be created by issuing a POST request to /job/create with following JSON
data 
  {{{{
  POST /job/create
@@ -80, +80 @@

  }}}}
  
  === Fetch Job ===
- To run the generate job call POST /job/create with following
+ To run the fetch job call POST /job/create with following
  {{{{
  POST /job/create
  {  
@@ -109, +109 @@

  }}}}
  
  === Parse Job ===
- To run the generate job call POST /job/create with following
+ To run the parse job call POST /job/create with following
  {{{{
  POST /job/create
  {  
@@ -137, +137 @@

  }
  }}}}
  
+ === Index Job ===
+ To run the index job call POST /job/create with following
+ {{{{
+ POST /job/create
+ {  
+     "type":"INDEX",
+     "confId":"new-config",
+     "crawlId":"crawl01",
+     "args": {}
+ }
+ }}}}
+ 
+ Before running the index job, the user needs to configure an indexer. User defined index
like (Solr, Elasticsearch) can be configured by using the configuration end point.
+ A detailed description of how to configure and run the index job can be found at [[https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI/RunningJobsTutorial/IndexJob|here]].
+ 
+ The args contain keys - crawldb, linkdb, params, dir, segements, noCommit, deleteGone, filter,
normalize
+ 
+ The response of the request in a JSON output
+ {{{{
+ {
+     "confId":"new-config",
+     "args":{},
+     "crawlId":"crawl01",
+     "msg":"OK",
+     "id":"default-INDEX-572647647",
+     "state":"RUNNING",
+     "type":"INDEX",
+     "result":null
+ }
+ }}}}
+ 
+ 
  === Updatedb Job ===
- To run the generate job call POST /job/create with following
+ To run the updatedb job call POST /job/create with following
  {{{{
  POST /job/create
  {  
@@ -167, +199 @@

  }}}}
  
  === Invertlinks Job ===
- To run the generate job call POST /job/create with following
+ To run the invertlinks job call POST /job/create with following
  {{{{
  POST /job/create
  {  
@@ -198, +230 @@

  
  
  === Dedup Job ===
- To run the generate job call POST /job/create with following
+ To run the dedup job call POST /job/create with following
  {{{{
  POST /job/create
  {  

Mime
View raw message