I've opened a ticket - CONNECTORS-764. Karl On Tue, Aug 13, 2013 at 6:32 PM, Karl Wright wrote: > I may have a scenario that could trigger the problem. > > (1) Set the max hops for a job relatively low > (2) Crawl > (3) Increase the max hops > (4) Crawl again > > I think under these conditions, it may be that we're not properly removing > the "hop count exceeded" states for documents that were encountered in the > first crawl that were too far from the seeds. > > If this is the problem, it should be easy to confirm. I'm not quite sure > how to fix it yet though - need to do some research. > > Karl > > > On Tue, Aug 13, 2013 at 10:16 AM, Karl Wright wrote: > >> Hi Erlend, >> >> I see what must be happening. The intrinsiclink table already has the >> link to the skuespill document in it, and because of that, nothing in the >> hopcount world is even getting looked at. So in a nutshell, the problem is >> that somehow the hopcount table's data was messed up, but now there's no >> good way to recover. >> >> I would really like to know how it got messed up in the first place, but >> since there's been a lot of activity on that machine it would be a >> challenge to come up with the exact sequence of events. If you think you >> remember it, please write it down and maybe try it on your test instance. >> But for now, the simplest way to get the production instance back up and >> running is to do the following: >> >> - Note all the job settings and configuration >> - Delete the job >> - Recreate the job >> - Run the job >> >> Since there are very few documents in the job, it does not sound like >> much of a problem to do that. Would this work for you? >> Karl >> >> >> >> On Tue, Aug 13, 2013 at 9:55 AM, Erlend Garåsen wrote: >> >>> On 8/13/13 3:34 PM, Karl Wright wrote: >>> >>> Can you enable hopcount debugging, and rerun? >>>> "org.apache.manifoldcf.**hopcount" set to the value "DEBUG" in >>>> properties.xml. >>>> >>> >>> For some odd reason, MCF does not log anything more with this >>> configuration entry enabled: >>> >>> >>> I have double-checked everything - the configuration file is sucessfully >>> read after I restart the Agent process and there is no old processes >>> running (checked with ps command). MCF has responded to every change I have >>> done to properties.xml so far, but not this one. >>> >>> Here's the log output: >>> >>> WARN 2013-08-13 15:50:57,350 (Worker thread '24') - Web: Unknown >>> robots.txt line from 'www.ibsen.uio.no:80': '>> encoding="UTF-8"?>' >>> WARN 2013-08-13 15:50:57,351 (Worker thread '24') - Web: Unknown >>> robots.txt line from 'www.ibsen.uio.no:80': '>> WARN 2013-08-13 15:50:57,351 (Worker thread '24') - Web: Unknown >>> robots.txt line from 'www.ibsen.uio.no:80': ' PUBLIC "-//W3C//DTD >>> XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/** >>> DTD/xhtml1-transitional.dtd >>> ">' >>> WARN 2013-08-13 15:50:57,352 (Worker thread '24') - Web: Unknown >>> robots.txt line from 'www.ibsen.uio.no:80': '>> saxon-error-attribute="http://**www.w3.org/1999/xhtml" >>> xml:lang="no"> Henrik Ibsens skrifter: >>> Feilmelding>> href="rammeverk.css" media="all"/>>> href="vitnemouseover.css"/><**link xmlns:tei="http://www.tei-c.** >>> org/ns/1.0 " xmlns:HIS="http://www.example. >>> **org/ns/HIS " xmlns:exist="http://exist. >>> **sourceforge.net/NS/exist " >>> rel="icon" type="image/png" href="icons/favicon.ico"/><**script >>> xmlns:tei="http://www.tei-c.**org/ns/1.0 " >>> xmlns:HIS="http://www.example.**org/ns/HIS" >>> xmlns:exist="http://exist.**sourceforge.net/NS/exist" >>> src="http://code.jquery.com/**jquery-1.6.2.min.js" >>> type="text/javascript">return void;