manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <>
Subject Hop count problem
Date Mon, 12 Aug 2013 10:39:09 GMT

I have discovered an odd thing regarding hop counts. Our prod 
environment crawls a lot fewer documents compared to our test 
environment even though the configuration is exactly the same. Then I 
figured out that several documents which are expected to be fetched are, 
according to MCF, outside the hop count limit, but they're not.

This can be reproduced by using a small job for one particular host, The seed list is as follows:

Hop filter settings are:
link: 6
redirect: 3

Only these two documents are fetched:

Here's what MCF says about one omitted document, i.e.,
State: out of scope
Status: Hopcount exceeded

This is odd. If you open up, you can see that the link 
"" (Skuespill) appears on the 
main page.

Our test environment fetches this document without problems.


View raw message