nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ilguiz Latypov (JIRA)" <j...@apache.org>
Subject [jira] Issue Comment Edited: (NUTCH-427) protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
Date Sat, 08 Nov 2008 04:50:44 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12645933#action_12645933
] 

ilatypov edited comment on NUTCH-427 at 11/7/08 8:50 PM:
---------------------------------------------------------------

Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.  Protected special characters such as apostrophe and hash 
mark with URL encoding.

Fixed the infinite retry loop in SMB.java.

Tried but could not activate the Apache logging.


      was (Author: ilatypov):
    Fixed reading of SMB files, updated to jcifs 1.3.0, enhanced the smoke
test app.

Tried but could not activate the Apache logging.

  
> protocol-smb: plugin protocol implementing the CIFS/SMB protocol. This protocol allows
Nutch to crawl Microsoft Windows Shares remotely using the CIFS/SMB protocol implmentation.
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NUTCH-427
>                 URL: https://issues.apache.org/jira/browse/NUTCH-427
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8.1, 0.9.0, 1.0.0
>         Environment: JAVA - OS independent
>            Reporter: Armel Nene
>            Priority: Minor
>         Attachments: protocol-smb-diff.txt, protocol-smb.zip, protocol-smb.zip, protocol-smb.zip
>
>
> Title:    protocol-smb - Nutch protocol plugin for crawling Microsoft Windows shares
> Author:   Armel T. Nene 
> Update:   Vadim Bauer
> Email:    armel.nene NOSPAM-AT-NOSPAM idna-solutions.com, V a d i m B a u e r <AT>
g m x . d e
> A.  Introduction
>     The protocol-smb plugins allows you to crawl Microsoft Windows shares. It implements
>     the CIFS/SMB protocol which is commonly used on Microsoft OS. The plugin replicate
the
>     behaviour of the protocol-file over CIFS/SMB protocol. This plugin uses the JCifs
library and also
>     support all the properties from the JCifs library.
>     You can find more information on the following site: http://jcifs.samba.org/
>     The smb protocol syntax for crawling is as follow: smb://xxxxx (i.e. smb://server/share).
>     
> B.  Installation
>     1) Binaries only:   The protocol-smb files can be found in the ../plugins directory.
> 				Copy the "protocol-smb" to NUTCHHOME/build/plugins directory.
>                         Put the "smb.properties" file in the NUTCHHOME/conf directory.
>                         Configure the properties in "smb.properties" file
>                         Enable the plugin by updating "nutch-site.xml" file found in
NUTCHHOME/conf directory
> 				e.g. <property>
>     				     	<name>plugin.includes</name>
>     				     	<value>protocol-smb| other plugins...</value>
>     				     	<description>
>  	 			     	</description>
>  	 			     </property>
>     2)  Source code:    The protocol-smb sources can be found in the ../src directory.
> 				Always refer to the Nutch wiki for detailed instructions on building Nutch.  In short:
>                         Copy the 'protocol-smb' folder to NUTCHHOME/src/plugin
>                         Update the build.xml in NUTCHHOME/src/plugin to include plugin
>                         Update the NUTCHHOME/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'smb.properties' file to NUTCHHOME/conf, and configure
the properties
>                         Enable the plugin by updating the nutch-site.xml file
> C: Known Issues
>     1) URLMalformedException: unkown protocol: smb
>        The SMB URL protocol handler is not being successfully installed. 
>        In short, the jCIFS jar must be loaded by the System class loader.
>        Workaround: a) a short term solutions will be to installed the JCIFS jar 
>                       library found in protocol-smb folder in 
>                       JDKHOME/jre/lib/ext and (or) JREHOME/lib/ext
>                    b) After completing step a), if the exeception is still thrown
>                       set the System properties by passing the following arguments
>                       to the JVM: 
>                       -Djava.protocol.handler.pkgs=jcifs
> 			 c) You can set the property also in your Code for example if 
> 			    you start Crawling with org.apache.nutch.crawl.Crawl
> 			    Add the following two lines. This will be the Same like in b)
> 			    public static void main(String args[]) throws Exception {
> 	  		    	System.setProperty("java.protocol.handler.pkgs", "jcifs");
> 				new java.util.PropertyPermission("java.protocol.handler.pkgs","read, write")
> 				//and so on
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>     2) FATAL smb.SMB - Could not read content of protocol: smb://xxxxxx
>        This problem usually occurs if the following properties are not set correctly
in
>        the "smb.properties" file:
>        - username
>        - password
>        - domain
>        Also refer to the following resources for more information on the list of
>        available properties and how to set them:
>        http://jcifs.samba.org/src/docs/api/overview-summary.html#scp
>        Also you can visit the FAQ page: http://jcifs.samba.org/src/docs/faq.html
>        N.B. All properties should set in the "smb.properties" file. You can set 
>             all supported JCIFS properties in the "smb.properties" file.
>      
>     3) Only tested on Windows XP and Windows Server 2003. Please report any tests 
>        conclusion on other OS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message