nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mr Shore <shore.cl...@gmail.com>
Subject org.apache.nutch.protocol.file.FileError: File Error: 404
Date Fri, 05 Jun 2009 04:43:28 GMT
During the crawling process,I see lots of report on
org.apache.nutch.protocol.file.FileError: File Error: 404,which are
all on locations with space in it.
I'm using nutch0.9,
is this really of bug?Any patch for it?

Here is part of the error logs:
/usr/local/apache2/resumes_txt/50/Summit
Point/Marissafolli/Receptionist/Administrative Assistant /Marissa
org.apache.nutch.protocol.file.FileError: File Error: 404
        at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)
org.apache.nutch.protocol.file.FileError: File Error: 404
        at org.apache.nutch.protocol.file.File.getProtocolOutput(File.java:100)
        at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:145)

The exact file is actually:
[root@file ~]# ls /usr/local/apache2/resumes_txt/50/Summit\
Point/Marissafolli/Receptionist/Administrative\ Assistant\
/Marissa\'s\ Resume.txt.txt
/usr/local/apache2/resumes_txt/50/Summit
Point/Marissafolli/Receptionist/Administrative Assistant /Marissa's
Resume.txt.txt

Seems nutch has failed to parse the url?
I'm using the file protocol,
sample url:
fetching file:////usr/local/apache2/resumes_txt/50/Ronceverte/tonyobrien/Owner/Operator/Anthony
O



-- 
http://maishudi.com/OMegle.php

Anonymous private chatting,have fun!

Mime
View raw message