nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Otis Gospodnetic (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (NUTCH-659) Help! No urls fetched for internal repository website
Date Wed, 12 Nov 2008 05:03:44 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Otis Gospodnetic resolved NUTCH-659.
------------------------------------

    Resolution: Invalid

Please ask questions on the mailing list.

> Help! No urls fetched for internal repository website
> -----------------------------------------------------
>
>                 Key: NUTCH-659
>                 URL: https://issues.apache.org/jira/browse/NUTCH-659
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher
>    Affects Versions: 0.9.0
>         Environment: nutch 0.9, TOMCAT6.0.18, JAVA 1.6.0_10, CentOS 5.2
>            Reporter: Bryan
>            Priority: Critical
>
> I am new to Nutch, and implemented Nutch for my internal company websites search. The
version is nutch-2008-11-02_04-01-26.tar.
>  
> My internal company websites includes several HTTP websites. 
> Another one is SVN repository HTTPS websites in XML structure, using <dir> and
<file> tag.
>  
> The search in HTTP websites is good. 
> The HTTPS is ok. We have some links in those HTTP websites which point to Word files
under SVN website. They can be indexed.
>  
> But the Nutch does not search my SVN website. If I only search the SVN website, it is
always: 0 urls fetched.
>  
> My nutch-site.xml is as following:
> <property>
>   <name>plugin.includes</name>
>   <value>protocol-httpclient|urlfilter-regex|parse-(text|html|js|msexcel|msword|mspowerpoint|pdf|zip|swf|rss)|index-(basic|anchor)|query-(basic|site|url)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
>  
> # skip file:, ftp:, & mailto: urls
> -^(ftp|mailto):
>  
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*smartlabs.com.au/
>  
> Any help would be much appreciated. Thanks in advnce.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message