manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend GarĂ¥sen <e.f.gara...@usit.uio.no>
Subject Re: Hop count problem
Date Mon, 12 Aug 2013 15:15:03 GMT

Thanks, I will tomorrow and report thereafter. I hope we will find a 
simple explanation. :)

E

On 8/12/13 5:07 PM, Karl Wright wrote:
> Hi Erlend,
>
> You have wire logging (httpclient) enabled, which is useful for
> debugging fetch issues, but you do not have connector debugging on.  To
> turn it on, add this to properties.xml:
>
> <property name="org.apache.manifoldcf.connectors" value="DEBUG"/>
>
> thanks,
> Karl
>
>
> On Mon, Aug 12, 2013 at 10:53 AM, Erlend GarĂ¥sen
> <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.no>> wrote:
>
>     On 8/12/13 4:29 PM, Karl Wright wrote:
>
>         Hi Erlend,
>
>         The Document Status report shows these documents because they
>         are still
>         in the queue.  The reasons for this could be several.  Documents
>         that
>         exceed the hopcount by 1 level are allowed to remain in the
>         queue for
>         bookkeeping purposes.  "scheduled date" as given only meaningful
>         if the
>         document is in an active state; my guess is that these documents
>         are not
>         in fact in that state, but rather in the state
>         HOPCOUNT_EXCEEDED.  Can
>         you include one complete row from the Document Status report for
>         one of
>         the missing documents?
>
>
>     For "http://www.ibsen.uio.no/__sakprosa.xhtml
>     <http://www.ibsen.uio.no/sakprosa.xhtml>":
>     Job: Ibsen
>
>     State: Out of scope
>     Status: Hopcount exceeded
>     Scheduled: 01-01-1970 01:00:00.000
>     Scheduled action: Process
>     Retry count: N/A
>     Retry limit: N/A
>
>
>         When you added documents to the seed list, what did the Simple
>         History
>         say when they were fetched?  If they don't appear in the simple
>         history,
>         they SHOULD have nevertheless appeared in the log, with an
>         explanation
>         of why they were excluded, provided you have connector debugging
>         enabled.
>
>
>     OK, here is the seed list:
>     http://www.ibsen.uio.no/
>
>     http://www.ibsen.uio.no/__skuespill.xhtml
>     <http://www.ibsen.uio.no/skuespill.xhtml>
>     http://www.ibsen.uio.no/dikt.__xhtml
>     <http://www.ibsen.uio.no/dikt.xhtml>
>     http://www.ibsen.uio.no/brev.__xhtml
>     <http://www.ibsen.uio.no/brev.xhtml>
>     http://www.ibsen.uio.no/__sakprosa.xhtml
>     <http://www.ibsen.uio.no/sakprosa.xhtml>
>     http://www.ibsen.uio.no/varia.__xhtml
>     <http://www.ibsen.uio.no/varia.xhtml>
>     http://www.ibsen.uio.no/__undervisningsressurser.xhtml
>     <http://www.ibsen.uio.no/undervisningsressurser.xhtml>
>
>     Here is the results from simple history:
>     08-12-2013 16:46:26.536         job end         1368534065016(Ibsen)
>                      0       1
>     08-12-2013 16:46:09.927         document ingest (Solr)
>     http://www.ibsen.uio.no/__forside.xhtml
>     <http://www.ibsen.uio.no/forside.xhtml>
>              OK      11897   178
>     08-12-2013 16:46:09.751         fetch
>     http://www.ibsen.uio.no/__forside.xhtml
>     <http://www.ibsen.uio.no/forside.xhtml>
>              200     11897   17
>     08-12-2013 16:44:48.829         fetch http://www.ibsen.uio.no/
>              302     0       79484
>     08-12-2013 16:44:48.727         robots parse www.ibsen.uio.no:80
>     <http://www.ibsen.uio.no:80>
>
>              HTML    0       2       Robots file contained HTML, skipped
>     08-12-2013 16:44:46.574         job start       1368534065016(Ibsen)
>                      0       1
>              1
>
>     HttpClient log:
>     http://folk.uio.no/erlendfg/__manifoldcf/manifoldcf.log
>     <http://folk.uio.no/erlendfg/manifoldcf/manifoldcf.log>
>
>     Erlend
>
>


Mime
View raw message