manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Hop count problem
Date Tue, 13 Aug 2013 13:34:29 GMT
Hi Erlend,

I still don't see how this can happen after looking at the code.

Can you enable hopcount debugging, and rerun?
"org.apache.manifoldcf.hopcount" set to the value "DEBUG" in properties.xml.

Thanks!
Karl



On Tue, Aug 13, 2013 at 9:16 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hmm. This is not at all what I would have expected.
>
> If "skueskill" is directly referenced by a seed document, or (worse) is in
> the seed list, I cannot see *how* the document can possibly have this state.
>
> - the referencing document definitely has a parseable reference to the
> document in question, and in any case having it be a "seed" should make the
> hopcount be zero;
> - if the reference is being filtered, it would be filtered from
> everywhere, and the document should thus get removed from the queue at the
> end of the job, because it is unreachable.
> - even if the hopcount tables have gotten corrupted, the fact that the
> document is a first-level reference or a seed should overwrite the record
> for that document.
>
> So I am at a complete loss to explain this behavior.
>
> Let me look through the code and see if I can find any code path that
> could lead to this behavior.
> Karl
>
>
> On Tue, Aug 13, 2013 at 9:01 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no>wrote:
>
>> On 8/13/13 2:47 PM, Karl Wright wrote:
>>
>>> Looks like you need to re-enable connector debugging before we can see
>>> anything.
>>>
>>
>> Unfortunately, yes. A bording task which must be done.
>>
>>
>>  Also, does the missing document (skuespill) appear in the Document
>>> Status report after the crawl?  Can you include that here if it does?
>>> (I am betting it does not...)
>>>
>>
>> I added 60 mins as a time offset value, but I'm not 100% sure whether the
>> given result from Document status was created by this job run or is an old
>> entry in the database:
>>
>> Idenfifier: http://www.ibsen.uio.no/**skuespill.xhtml<http://www.ibsen.uio.no/skuespill.xhtml>
>>
>> Job: Ibsen
>> State: Out of scope
>> Statu: Hopcount exceeded
>>
>> Scheduled: 01-01-1970 01:00:00.000
>> Scheduled action: Process
>> Retry count / limit: N/A
>>
>> Erlend
>>
>
>

Mime
View raw message