manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Hop count problem
Date Mon, 12 Aug 2013 15:07:06 GMT
Hi Erlend,

You have wire logging (httpclient) enabled, which is useful for debugging
fetch issues, but you do not have connector debugging on.  To turn it on,
add this to properties.xml:

<property name="org.apache.manifoldcf.connectors" value="DEBUG"/>

thanks,
Karl


On Mon, Aug 12, 2013 at 10:53 AM, Erlend GarĂ¥sen <e.f.garasen@usit.uio.no>wrote:

> On 8/12/13 4:29 PM, Karl Wright wrote:
>
>> Hi Erlend,
>>
>> The Document Status report shows these documents because they are still
>> in the queue.  The reasons for this could be several.  Documents that
>> exceed the hopcount by 1 level are allowed to remain in the queue for
>> bookkeeping purposes.  "scheduled date" as given only meaningful if the
>> document is in an active state; my guess is that these documents are not
>> in fact in that state, but rather in the state HOPCOUNT_EXCEEDED.  Can
>> you include one complete row from the Document Status report for one of
>> the missing documents?
>>
>
> For "http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml>
> ":
> Job: Ibsen
>
> State: Out of scope
> Status: Hopcount exceeded
> Scheduled: 01-01-1970 01:00:00.000
> Scheduled action: Process
> Retry count: N/A
> Retry limit: N/A
>
>
>  When you added documents to the seed list, what did the Simple History
>> say when they were fetched?  If they don't appear in the simple history,
>> they SHOULD have nevertheless appeared in the log, with an explanation
>> of why they were excluded, provided you have connector debugging enabled.
>>
>
> OK, here is the seed list:
> http://www.ibsen.uio.no/
>
> http://www.ibsen.uio.no/**skuespill.xhtml<http://www.ibsen.uio.no/skuespill.xhtml>
> http://www.ibsen.uio.no/dikt.**xhtml <http://www.ibsen.uio.no/dikt.xhtml>
> http://www.ibsen.uio.no/brev.**xhtml <http://www.ibsen.uio.no/brev.xhtml>
> http://www.ibsen.uio.no/**sakprosa.xhtml<http://www.ibsen.uio.no/sakprosa.xhtml>
> http://www.ibsen.uio.no/varia.**xhtml<http://www.ibsen.uio.no/varia.xhtml>
> http://www.ibsen.uio.no/**undervisningsressurser.xhtml<http://www.ibsen.uio.no/undervisningsressurser.xhtml>
>
> Here is the results from simple history:
> 08-12-2013 16:46:26.536         job end         1368534065016(Ibsen)
>                 0       1
> 08-12-2013 16:46:09.927         document ingest (Solr)
> http://www.ibsen.uio.no/**forside.xhtml<http://www.ibsen.uio.no/forside.xhtml>
>         OK      11897   178
> 08-12-2013 16:46:09.751         fetch   http://www.ibsen.uio.no/**
> forside.xhtml <http://www.ibsen.uio.no/forside.xhtml>
>         200     11897   17
> 08-12-2013 16:44:48.829         fetch   http://www.ibsen.uio.no/
>         302     0       79484
> 08-12-2013 16:44:48.727         robots parse    www.ibsen.uio.no:80
>
>         HTML    0       2       Robots file contained HTML, skipped
> 08-12-2013 16:44:46.574         job start       1368534065016(Ibsen)
>                 0       1
>         1
>
> HttpClient log:
> http://folk.uio.no/erlendfg/**manifoldcf/manifoldcf.log<http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log>
>
> Erlend
>
>

Mime
View raw message