nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "Incremental Crawling Scripts Test" by Gabriele Kahlout
Date Tue, 29 Mar 2011 09:58:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "Incremental Crawling Scripts Test" page has been changed by Gabriele Kahlout.
The comment on this change is: updated output.
http://wiki.apache.org/nutch/Incremental%20Crawling%20Scripts%20Test?action=diff&rev1=4&rev2=5

--------------------------------------------------

  
  == 3. ==
  {{{
- $ ./whole-web-crawling-incremental urls-input/MR6
+ $ ./whole-web-crawling-incremental -i 15 -d 2 seeds/MR6
- bin/hadoop dfs -rmr crawl
- Deleted file:/Users/simpatico/nutch-1.2/crawl
- 
- curl --fail http://localhost:8080/solr/update?commit=true -d '<delete><query>*:*</query></delete>'
- <?xml version="1.0" encoding="UTF-8"?>
- <response>
- <lst name="responseHeader"><int name="status">0</int><int name="QTime">8</int></lst>
- </response>
- 
- rmr: cannot remove urls-input/MR6/it_seeds: No such file or directory.
+ rmr: cannot remove seeds/MR6/it_seeds: No such file or directory.
- bin/hadoop dfs -get urls-input/MR6/2urls urls-input/MR6/urls-local-only
+ bin/hadoop dfs -get seeds/MR6/20simple-urls seeds/MR6/urls-local-only
  
- 2 urls to crawl
+ 20 urls to crawl
- rm: cannot remove urls-input/MR6/it_seeds/urls: No such file or directory.
+ rm: cannot remove seeds/MR6/it_seeds/urls: No such file or directory.
  
- bin/nutch inject crawl/crawldb/0/0 urls-input/MR6/it_seeds
+ bin/nutch inject crawl/crawldb/0 seeds/MR6/it_seeds
- Injector: starting at 2011-03-28 23:37:13
+ Injector: starting at 2011-03-29 11:46:14
- Injector: crawlDb: crawl/crawldb/0/0
+ Injector: crawlDb: crawl/crawldb/0
- Injector: urlDir: urls-input/MR6/it_seeds
+ Injector: urlDir: seeds/MR6/it_seeds
  Injector: Converting injected urls to crawl db entries.
  Injector: Merging injected urls into crawl db.
- Injector: finished at 2011-03-28 23:37:20, elapsed: 00:00:07
+ Injector: finished at 2011-03-29 11:46:27, elapsed: 00:00:13
+ 
  
  generate-fetch-updatedb-invertlinks-index-merge iteration 0:
- 
- bin/nutch generate crawl/crawldb/0/0 crawl/segments -topN 10
+ bin/nutch generate crawl/crawldb/0 crawl/segments -topN 15
- Generator: starting at 2011-03-28 23:37:22 Generator: Selecting best-scoring urls due for
fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 10 Generator:
jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected
urls for politeness. Generator: segment: crawl/segments/20110328233727 Generator: finished
at 2011-03-28 23:37:30, elapsed: 00:00:07
+ Generator: starting at 2011-03-29 11:46:31 Generator: Selecting best-scoring urls due for
fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 15 Generator:
jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected
urls for politeness. Generator: segment: crawl/segments/20110329114641 Generator: finished
at 2011-03-29 11:46:45, elapsed: 00:00:13
  
- bin/nutch fetch crawl/segments/20110328233727
+ bin/nutch fetch crawl/segments/20110329114641
- Fetcher: starting at 2011-03-28 23:37:31
+ Fetcher: starting at 2011-03-29 11:46:49
- Fetcher: segment: crawl/segments/20110328233727
+ Fetcher: segment: crawl/segments/20110329114641
  Fetcher: threads: 10
- QueueFeeder finished: total 2 records + hit by time limit :0
+ QueueFeeder finished: total 15 records + hit by time limit :0
- fetching http://localhost:8080/qui/2.html
+ fetching http://simple.wikipedia.org/wiki/%C2%A3sd
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=14
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=14
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
- * queue: http://localhost
-   maxThreads    = 1
-   inProgress    = 0
-   crawlDelay    = 5000
-   minCrawlDelay = 0
-   nextFetchTime = 1301348260190
-   now           = 1301348255771
-   0. http://localhost:8080/qui/1.html
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
- * queue: http://localhost
-   maxThreads    = 1
-   inProgress    = 0
-   crawlDelay    = 5000
-   minCrawlDelay = 0
-   nextFetchTime = 1301348260190
-   now           = 1301348256777
-   0. http://localhost:8080/qui/1.html
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
- * queue: http://localhost
-   maxThreads    = 1
-   inProgress    = 0
-   crawlDelay    = 5000
-   minCrawlDelay = 0
-   nextFetchTime = 1301348260190
-   now           = 1301348257779
-   0. http://localhost:8080/qui/1.html
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=14
+ fetching http://simple.wikipedia.org/wiki/%2B44
- * queue: http://localhost
-   maxThreads    = 1
-   inProgress    = 0
-   crawlDelay    = 5000
-   minCrawlDelay = 0
-   nextFetchTime = 1301348260190
-   now           = 1301348258780
-   0. http://localhost:8080/qui/1.html
- -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=13
- * queue: http://localhost
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=13
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=13
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=13
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=13
+ fetching http://simple.wikipedia.org/wiki/%28What%27s_the_Story%29_Morning_Glory%3F
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ fetching http://simple.wikipedia.org/wiki/%C3%81ngel_S%C3%A1nchez_%28baseball%29
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=11
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=11
+ fetching http://simple.wikipedia.org/wiki/%C3%81ngel_Javier_Arizmendi
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=10
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=10
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=10
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=10
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=10
+ fetching http://simple.wikipedia.org/wiki/%C3%81lvaro_Mej%C3%ADa_P%C3%A9rez
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=9
+ fetching http://simple.wikipedia.org/wiki/%C3%81lvaro_Lopes_Can%C3%A7ado
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=8
+ fetching http://simple.wikipedia.org/wiki/%2703_Bonnie_&_Clyde
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=7
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=7
+ fetching http://simple.wikipedia.org/wiki/%C3%81lvaro_Arbeloa
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ fetching http://simple.wikipedia.org/wiki/%C3%81lvaro_Recoba
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ fetching http://simple.wikipedia.org/wiki/%C3%81lvaro_Sabor%C3%ADo
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
    maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392075853
+   now           = 1301392071230
+   0. http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+   1. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   2. http://simple.wikipedia.org/wiki/%27N_Sync
+   3. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392075853
+   now           = 1301392072235
+   0. http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+   1. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   2. http://simple.wikipedia.org/wiki/%27N_Sync
+   3. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392075853
+   now           = 1301392073253
+   0. http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+   1. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   2. http://simple.wikipedia.org/wiki/%27N_Sync
+   3. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392075853
+   now           = 1301392074257
+   0. http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+   1. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   2. http://simple.wikipedia.org/wiki/%27N_Sync
+   3. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392075853
+   now           = 1301392075261
+   0. http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+   1. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   2. http://simple.wikipedia.org/wiki/%27N_Sync
+   3. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ fetching http://simple.wikipedia.org/wiki/%27s-Hertogenbosch
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392081185
+   now           = 1301392076263
+   0. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   1. http://simple.wikipedia.org/wiki/%27N_Sync
+   2. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392081185
+   now           = 1301392077266
+   0. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   1. http://simple.wikipedia.org/wiki/%27N_Sync
+   2. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392081185
+   now           = 1301392078271
+   0. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   1. http://simple.wikipedia.org/wiki/%27N_Sync
+   2. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392081185
+   now           = 1301392079291
+   0. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   1. http://simple.wikipedia.org/wiki/%27N_Sync
+   2. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392081185
+   now           = 1301392080295
+   0. http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+   1. http://simple.wikipedia.org/wiki/%27N_Sync
+   2. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ fetching http://simple.wikipedia.org/wiki/%60Abdu%27l-Bah%C3%A1
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
-   inProgress    = 0
+   inProgress    = 1
    crawlDelay    = 5000
    minCrawlDelay = 0
-   nextFetchTime = 1301348260190
+   nextFetchTime = 1301392081185
-   now           = 1301348259783
+   now           = 1301392081299
-   0. http://localhost:8080/qui/1.html
- fetching http://localhost:8080/qui/1.html
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392086452
+   now           = 1301392082304
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392086452
+   now           = 1301392083306
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392086452
+   now           = 1301392084331
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392086452
+   now           = 1301392085334
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392086452
+   now           = 1301392086354
+   0. http://simple.wikipedia.org/wiki/%27N_Sync
+   1. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ fetching http://simple.wikipedia.org/wiki/%27N_Sync
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392091818
+   now           = 1301392087358
+   0. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392091818
+   now           = 1301392088361
+   0. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392091818
+   now           = 1301392089380
+   0. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392091818
+   now           = 1301392090382
+   0. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392091818
+   now           = 1301392091384
+   0. http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
+ fetching http://simple.wikipedia.org/wiki/%C3%81ngel_de_Saavedra,_Duke_of_Rivas
  -finishing thread FetcherThread, activeThreads=9
+ -finishing thread FetcherThread, activeThreads=8
+ -finishing thread FetcherThread, activeThreads=4
+ -finishing thread FetcherThread, activeThreads=3
+ -finishing thread FetcherThread, activeThreads=5
+ -finishing thread FetcherThread, activeThreads=6
+ -finishing thread FetcherThread, activeThreads=7
+ -finishing thread FetcherThread, activeThreads=1
+ -finishing thread FetcherThread, activeThreads=2
+ -finishing thread FetcherThread, activeThreads=0
+ -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
+ -activeThreads=0
+ Fetcher: finished at 2011-03-29 11:48:16, elapsed: 00:01:26
+ 
+ bin/nutch updatedb crawl/crawldb/0 crawl/segments/20110329114641
+ CrawlDb update: starting at 2011-03-29 11:48:19
+ CrawlDb update: db: crawl/crawldb/0
+ CrawlDb update: segments: [crawl/segments/20110329114641]
+ CrawlDb update: additions allowed: true
+ CrawlDb update: URL normalizing: false
+ CrawlDb update: URL filtering: false
+ CrawlDb update: Merging segment data into db.
+ CrawlDb update: finished at 2011-03-29 11:48:27, elapsed: 00:00:08
+ 
+ bin/nutch invertlinks crawl/linkdb -dir crawl/segments
+ LinkDb: starting at 2011-03-29 11:48:31
+ LinkDb: linkdb: crawl/linkdb
+ LinkDb: URL normalize: true
+ LinkDb: URL filter: true
+ LinkDb: adding segment: file:/Users/simpatico/nutch-1.2/crawl/segments/20110329114641
+ LinkDb: finished at 2011-03-29 11:48:37, elapsed: 00:00:05
+ 
+ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0 crawl/linkdb crawl/segments/20110329114641
+ SolrIndexer: starting at 2011-03-29 11:48:40
+ SolrIndexer: finished at 2011-03-29 11:48:53, elapsed: 00:00:13
+ 
+ Deleted file:/Users/simpatico/nutch-1.2/crawl/segments/20110329114641
+ 
+ 
+ generate-fetch-updatedb-invertlinks-index-merge iteration 1:
+ bin/nutch generate crawl/crawldb/0 crawl/segments -topN 15
+ Generator: starting at 2011-03-29 11:49:00 Generator: Selecting best-scoring urls due for
fetch. Generator: filtering: true Generator: normalizing: true Generator: topN: 15 Generator:
jobtracker is 'local', generating exactly one partition. Generator: Partitioning selected
urls for politeness. Generator: segment: crawl/segments/20110329114912 Generator: finished
at 2011-03-29 11:49:17, elapsed: 00:00:17
+ 
+ bin/nutch fetch crawl/segments/20110329114912
+ Fetcher: starting at 2011-03-29 11:49:20
+ Fetcher: segment: crawl/segments/20110329114912
+ Fetcher: threads: 10
+ QueueFeeder finished: total 15 records + hit by time limit :0
+ fetching http://bits.wikimedia.org/skins-1.17/vector/images/search-ltr.png?301-2
+ fetching http://creativecommons.org/licenses/by-sa/3.0/
+ fetching http://simple.wikipedia.org/wiki/Special:SpecialPages
+ -activeThreads=10, spinWaiting=7, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=7, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=7, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=12
+ fetching http://bits.wikimedia.org/simple.wikipedia.org/load.php?debug=false&lang=en&modules=mediawiki.legacy.commonPrint%7Cmediawiki.legacy.shared%7Cskins.vector&only=styles&skin=vector
+ fetching http://simple.wikipedia.org/w/api.php?action=rsd
+ fetching http://simple.wikipedia.org/w/opensearch_desc.php
+ fetching http://simple.wikipedia.org/wiki/Special:Random
+ fetching http://simple.wikipedia.org/w/index.php?title=Special:RecentChanges&feed=atom
+ fetching http://simple.wikipedia.org/wiki/Special:RecentChanges
+ -activeThreads=10, spinWaiting=8, fetchQueues.totalSize=6
+ Error parsing: http://bits.wikimedia.org/simple.wikipedia.org/load.php?debug=false&lang=en&modules=mediawiki.legacy.commonPrint%7Cmediawiki.legacy.shared%7Cskins.vector&only=styles&skin=vector:
failed(2,0): Can't retrieve Tika parser for mime-type text/css
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=6
+ fetching http://simple.wikipedia.org/wiki/Main_Page
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=5
+ fetching http://simple.wikipedia.org/wiki/Special:Categories
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392192406
+   now           = 1301392187826
+   0. http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   2. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   3. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392192406
+   now           = 1301392188827
+   0. http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   2. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   3. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392192406
+   now           = 1301392189830
+   0. http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   2. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   3. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392192406
+   now           = 1301392190848
+   0. http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   2. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   3. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=4
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392192406
+   now           = 1301392191911
+   0. http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   2. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   3. http://simple.wikipedia.org/wiki/Help:Contents
+ fetching http://simple.wikipedia.org/wiki/Category:Sportspeople_stubs
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392192918
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   2. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392193931
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   2. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392194934
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   2. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392195936
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   2. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=3
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392196939
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+   1. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   2. http://simple.wikipedia.org/wiki/Help:Contents
+ fetching http://simple.wikipedia.org/wiki/Wikipedia:Simple_talk
+ -activeThreads=10, spinWaiting=9, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 1
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392197582
+   now           = 1301392197942
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392203047
+   now           = 1301392198944
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392203047
+   now           = 1301392199946
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392203047
+   now           = 1301392200990
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392203047
+   now           = 1301392202049
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=2
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392203047
+   now           = 1301392203054
+   0. http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+   1. http://simple.wikipedia.org/wiki/Help:Contents
+ fetching http://simple.wikipedia.org/wiki/Wikipedia:Simple_start
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392208298
+   now           = 1301392204057
+   0. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392208298
+   now           = 1301392205059
+   0. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392208298
+   now           = 1301392206061
+   0. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392208298
+   now           = 1301392207064
+   0. http://simple.wikipedia.org/wiki/Help:Contents
+ -activeThreads=10, spinWaiting=10, fetchQueues.totalSize=1
+ * queue: http://simple.wikipedia.org
+   maxThreads    = 1
+   inProgress    = 0
+   crawlDelay    = 5000
+   minCrawlDelay = 0
+   nextFetchTime = 1301392208298
+   now           = 1301392208066
+   0. http://simple.wikipedia.org/wiki/Help:Contents
+ fetching http://simple.wikipedia.org/wiki/Help:Contents
+ -finishing thread FetcherThread, activeThreads=8
  -finishing thread FetcherThread, activeThreads=8
  -finishing thread FetcherThread, activeThreads=7
  -finishing thread FetcherThread, activeThreads=6
+ -finishing thread FetcherThread, activeThreads=4
  -finishing thread FetcherThread, activeThreads=5
- -finishing thread FetcherThread, activeThreads=3
  -finishing thread FetcherThread, activeThreads=3
  -finishing thread FetcherThread, activeThreads=2
  -finishing thread FetcherThread, activeThreads=1
  -finishing thread FetcherThread, activeThreads=0
  -activeThreads=0, spinWaiting=0, fetchQueues.totalSize=0
  -activeThreads=0
- Fetcher: finished at 2011-03-28 23:37:41, elapsed: 00:00:10
+ Fetcher: finished at 2011-03-29 11:50:11, elapsed: 00:00:50
  
- bin/nutch updatedb crawl/crawldb/0/0 crawl/segments/20110328233727
+ bin/nutch updatedb crawl/crawldb/0 crawl/segments/20110329114912
- CrawlDb update: starting at 2011-03-28 23:37:43
+ CrawlDb update: starting at 2011-03-29 11:50:15
- CrawlDb update: db: crawl/crawldb/0/0
+ CrawlDb update: db: crawl/crawldb/0
- CrawlDb update: segments: [crawl/segments/20110328233727]
+ CrawlDb update: segments: [crawl/segments/20110329114912]
  CrawlDb update: additions allowed: true
  CrawlDb update: URL normalizing: false
  CrawlDb update: URL filtering: false
  CrawlDb update: Merging segment data into db.
- CrawlDb update: finished at 2011-03-28 23:37:47, elapsed: 00:00:04
+ CrawlDb update: finished at 2011-03-29 11:50:25, elapsed: 00:00:09
  
  bin/nutch invertlinks crawl/linkdb -dir crawl/segments
- LinkDb: starting at 2011-03-28 23:37:49
+ LinkDb: starting at 2011-03-29 11:50:28
  LinkDb: linkdb: crawl/linkdb
  LinkDb: URL normalize: true
  LinkDb: URL filter: true
- LinkDb: adding segment: file:/Users/simpatico/nutch-1.2/crawl/segments/20110328233727
+ LinkDb: adding segment: file:/Users/simpatico/nutch-1.2/crawl/segments/20110329114912
+ LinkDb: merging with existing linkdb: crawl/linkdb
- LinkDb: finished at 2011-03-28 23:37:52, elapsed: 00:00:03
+ LinkDb: finished at 2011-03-29 11:50:40, elapsed: 00:00:12
  
- bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0/0 crawl/linkdb crawl/segments/20110328233727
+ bin/nutch solrindex http://localhost:8080/solr crawl/crawldb/0 crawl/linkdb crawl/segments/20110329114912
- SolrIndexer: starting at 2011-03-28 23:37:53
+ SolrIndexer: starting at 2011-03-29 11:50:43
- SolrIndexer: finished at 2011-03-28 23:38:00, elapsed: 00:00:06
+ SolrIndexer: finished at 2011-03-29 11:50:56, elapsed: 00:00:13
  
+ Deleted file:/Users/simpatico/nutch-1.2/crawl/segments/20110329114912
  
- bin/nutch readdb crawl/crawldb/0/0 -stats
+ bin/nutch readdb crawl/crawldb/0 -stats
- CrawlDb statistics start: crawl/crawldb/0/0
+ CrawlDb statistics start: crawl/crawldb/0
- Statistics for CrawlDb: crawl/crawldb/0/0
+ Statistics for CrawlDb: crawl/crawldb/0
- TOTAL urls:	2
+ TOTAL urls:	1261
- retry 0:	2
+ retry 0:	1261
- min score:	1.0
+ min score:	0.0010
- avg score:	1.0
+ avg score:	0.02382157
- max score:	1.0
+ max score:	1.24
+ status 1 (db_unfetched):	1231
- status 2 (db_fetched):	2
+ status 2 (db_fetched):	26
+ status 3 (db_gone):	4
  CrawlDb statistics: done
  
- Deleted file:/Users/simpatico/nutch-1.2/urls-input/MR6/it_seeds
+ Deleted file:/Users/simpatico/nutch-1.2/seeds/MR6/it_seeds
  
  }}}
  

Mime
View raw message