nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Nutch Wiki] Update of "PluginCentral" by johnroman
Date Wed, 26 Nov 2008 20:26:48 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by johnroman:
http://wiki.apache.org/nutch/PluginCentral

------------------------------------------------------------------------------
   * WritingPluginExample - A step-by-step example of how to write a plugin for the 0.7 branch.
- updated by LucasBoullosa
   * [http://wiki.media-style.com/display/nutchDocu/Write+a+plugin Writing Plugins] - by Stefan
  
- == Plugins that Come with Nutch (0.7) ==
+ == Plugins that Come with Nutch (0.9) ==
  
  In order to get Nutch to use any of these plugins, you just need to edit your conf/nutch-site.xml
file and add the name of the plugin to the list of plugin.includes.
  
@@ -24, +24 @@

   * '''parse-html''' - Parses HTML documents
   * '''parse-js''' - Parses Java``Script
   * '''parse-mp3''' - Parses MP3s
+  * '''parse-zip''' - Parses ZIP archives
+  * '''parse-mspowerpoint''' - Parses Microsoft Powerpoint files
   * '''parse-msword''' - Parses MS Word documents
+  * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-pdf''' - Parses PDFs
   * '''parse-rss''' - Parses RSS feeds
+  * '''parse-oo''' - Parses OpenOffice files
+  * '''parse-swf''' - Parses Shockwave Flash
   * '''parse-rtf''' - Parses RTF files
   * '''parse-text''' - Parses text documents
   * '''protocol-file''' - Retreives documents from the filesystem
@@ -47, +52 @@

   * '''lib-commons-httpclient'''
   * '''lib-http'''
   * '''lib-jakarta-poi'''
-  * '''lib-log4j'''
+  * '''lib-log4j''' 
-  * '''lib-lucene-analyzers'''
+  * '''lib-lucene-analyzers''' - Lucene analyzers
-  * '''lib-nekohtml'''
-  * '''lib-parsems'''
+  * '''lib-nekohtml''' - automatic tag balancer 
+  * '''lib-parsems''' - parse ms documents framework
   * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
   * '''parse-oo''' - Parses Open Office and Star Office documents (Extentsions: ODT, OTT,
ODH, ODM, ODS, OTS, ODP, OTP, SXW, STW, SXC, STC, SXI, STI)

Mime
View raw message