Hi Erlend,

I still don't see how this can happen after looking at the code.

Can you enable hopcount debugging, and rerun?  "org.apache.manifoldcf.hopcount" set to the value "DEBUG" in properties.xml.


On Tue, Aug 13, 2013 at 9:16 AM, Karl Wright <daddywri@gmail.com> wrote:
Hmm. This is not at all what I would have expected.

If "skueskill" is directly referenced by a seed document, or (worse) is in the seed list, I cannot see *how* the document can possibly have this state.

- the referencing document definitely has a parseable reference to the document in question, and in any case having it be a "seed" should make the hopcount be zero;
- if the reference is being filtered, it would be filtered from everywhere, and the document should thus get removed from the queue at the end of the job, because it is unreachable.
- even if the hopcount tables have gotten corrupted, the fact that the document is a first-level reference or a seed should overwrite the record for that document.

So I am at a complete loss to explain this behavior.

Let me look through the code and see if I can find any code path that could lead to this behavior.

On Tue, Aug 13, 2013 at 9:01 AM, Erlend Garåsen <e.f.garasen@usit.uio.no> wrote:
On 8/13/13 2:47 PM, Karl Wright wrote:
Looks like you need to re-enable connector debugging before we can see

Unfortunately, yes. A bording task which must be done.

Also, does the missing document (skuespill) appear in the Document
Status report after the crawl?  Can you include that here if it does?
(I am betting it does not...)

I added 60 mins as a time offset value, but I'm not 100% sure whether the given result from Document status was created by this job run or is an old entry in the database:

Idenfifier: http://www.ibsen.uio.no/skuespill.xhtml

Job: Ibsen
State: Out of scope
Statu: Hopcount exceeded

Scheduled: 01-01-1970 01:00:00.000
Scheduled action: Process
Retry count / limit: N/A