manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: Hop count problem
Date Mon, 12 Aug 2013 12:16:41 GMT
On 8/12/13 1:31 PM, Karl Wright wrote:

> Based on your report that the test environment works OK, and the
> production environment does not, I expect there is something like this
> going on.  I know you attempted to fetch the intervening document from
> your test environment, but it is conceivable that the production
> environment is unable to get it.  You should see evidence of that in the
> simple history, if so.

I have looked through the complete history regarding this host, and none 
of the other documents have ever been fetched. The only thing I can see 
is an illegal robots.txt file:
robots parse 	www.ibsen.uio.no:80
	HTML 	0 	1 	Robots file contained HTML, skipped

I don't think this robots file has stopped MCF from crawling the other 
documents since I can see this entry in the our test environment as 
well. I even tried to disable robots.txt checks, but the problems persist.

I forgot to mention that the hopcount mode is "Keep unreachable 
documents, forever"

So, if I understand you correctly, there is no point of hacking the 
database since MCF will try to refetch unreachable documents anyway. I 
can of course enable HttpClient logging and check whether MCF tries to 
fetch these resources at all.

Erlend


Mime
View raw message