nutch-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sna...@apache.org
Subject [nutch] branch master updated: NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient Improve description of property plugin.includes: - https is supported by default - no need to enable the stub plugin nutch-extensionpoints
Date Fri, 18 Jan 2019 15:26:24 GMT
This is an automated email from the ASF dual-hosted git repository.

snagel pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/nutch.git


The following commit(s) were added to refs/heads/master by this push:
     new 9ae7a80  NUTCH-2680 Documentation: https supported by multiple protocol plugins not
only httpclient Improve description of property plugin.includes: - https is supported by default
- no need to enable the stub plugin nutch-extensionpoints
     new 0c18f6c  Merge pull request #426 from sebastian-nagel/NUTCH-2680
9ae7a80 is described below

commit 9ae7a8049c246aa638328605a8ce0922e48dddf6
Author: Sebastian Nagel <snagel@apache.org>
AuthorDate: Mon Jan 7 12:16:10 2019 +0100

    NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient
    Improve description of property plugin.includes:
    - https is supported by default
    - no need to enable the stub plugin nutch-extensionpoints
---
 conf/nutch-default.xml | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/conf/nutch-default.xml b/conf/nutch-default.xml
index 00cb845..913f901 100644
--- a/conf/nutch-default.xml
+++ b/conf/nutch-default.xml
@@ -1360,11 +1360,11 @@
   <value>protocol-http|urlfilter-(regex|validator)|parse-(html|tika)|index-(basic|anchor)|indexer-solr|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
   <description>Regular expression naming plugin directory names to
   include.  Any plugin not matching this expression is excluded.
-  In any case you need at least include the nutch-extensionpoints plugin. By
-  default Nutch includes crawling just HTML and plain text via HTTP,
-  and basic indexing and search plugins. In order to use HTTPS please enable 
-  protocol-httpclient, but be aware of possible intermittent problems with the 
-  underlying commons-httpclient library. Set parsefilter-naivebayes for classification based
focused crawler.
+  By default Nutch includes plugins to crawl HTML and various other
+  document formats via HTTP/HTTPS and indexing the crawled content
+  into Solr.  More plugins are available to support more indexing
+  backends, to fetch ftp:// and file:// URLs, for focused crawling,
+  and many other use cases.
   </description>
 </property>
 


Mime
View raw message