manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Manifold RSS connector gets "stuck" after a few docs are processed
Date Thu, 08 Mar 2018 15:37:13 GMT
As a sanity check, I ran the postgresql RSS connector IT test on trunk and
it passed:

>>>>>>
run-IT-postgresql:
    [junit] Testsuite:
org.apache.manifoldcf.crawler.connectors.rss.tests.RSSSimpleCrawlPostgresqlIT
    [junit] Configuration file successfully read
    [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized
@3336ms
    [junit] [main] INFO org.eclipse.jetty.server.Server -
jetty-9.2.3.v20140905
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@4d1c005e
{/mcf-crawler-ui,file:/C:/Users/kawright/AppData/Local/Temp/
jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@8462f31
{/mcf-authority-service,file:/C:/Users/kawright/AppData/Local
/Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-authority-service-any-8765187688005999492.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-service
.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@24569dba
{/mcf-api-service,file:/C:/Users/kawright/AppData/Local/Temp
/jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any-1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-api-service.war}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Started
ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
    [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms
    [junit] [main] INFO org.eclipse.jetty.server.Server -
jetty-9.2.3.v20140905
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Started
ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
    [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms
    [junit] Crawl required 90542 milliseconds
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@24569dba
{/mcf-api-service,file:/C:/Users/kawright/AppData/Local/Temp
/jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any-1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-api-service.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@8462f31
{/mcf-authority-service,file:/C:/Users/kawright/AppData/Local
/Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-authority-service-any-8765187688005999492.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-servi
ce.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@4d1c005e
{/mcf-crawler-ui,file:/C:/Users/kawright/AppData/Local/Temp/
jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war}
    [junit] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed:
126.5 sec
    [junit]
    [junit] ------------- Standard Error -----------------
    [junit] Configuration file successfully read
    [junit] [main] INFO org.eclipse.jetty.util.log - Logging initialized
@3336ms
    [junit] [main] INFO org.eclipse.jetty.server.Server -
jetty-9.2.3.v20140905
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@4d1c005e
{/mcf-crawler-ui,file:/C:/Users/kawright/AppData/Local/Temp/
jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-4871569714684839734.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@8462f31
{/mcf-authority-service,file:/C:/Users/kawright/AppData/Local
/Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-authority-service-any-8765187688005999492.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-service
.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.w.WebAppContext@24569dba
{/mcf-api-service,file:/C:/Users/kawright/AppData/Local/Temp
/jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any-1263632524762735599.dir/webapp/,AVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-api-service.war}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Started
ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
    [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6277ms
    [junit] [main] INFO org.eclipse.jetty.server.Server -
jetty-9.2.3.v20140905
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Started o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,AVAILABLE}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Started
ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
    [junit] [main] INFO org.eclipse.jetty.server.Server - Started @6290ms
    [junit] Crawl required 90542 milliseconds
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConnector@3eb77ea8{HTTP/1.1}{0.0.0.0:8189}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.s.ServletContextHandler@7d286fb6{/rss,null,UNAVAILABLE}
    [junit] [main] INFO org.eclipse.jetty.server.ServerConnector - Stopped
ServerConnector@1e1ff947{HTTP/1.1}{0.0.0.0:8346}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@24569dba
{/mcf-api-service,file:/C:/Users/kawright/AppData/Local/Temp
/jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any-1263632524762735599.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-api-service.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@8462f31
{/mcf-authority-service,file:/C:/Users/kawright/AppData/Local
/Temp/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-authority-service-any-8765187688005999492.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-authority-servi
ce.war}
    [junit] [main] INFO org.eclipse.jetty.server.handler.ContextHandler -
Stopped o.e.j.w.WebAppContext@4d1c005e
{/mcf-crawler-ui,file:/C:/Users/kawright/AppData/Local/Temp/
jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-4871569714684839734.dir/webapp/,UNAVAILABLE}{C:\wip\mcf\trunk\dist/web/war/mcf-crawler-ui.war}
    [junit] ------------- ---------------- ---------------

BUILD SUCCESSFUL
Total time: 2 minutes 8 seconds
<<<<<<

This is running against my installed laptop version of Postgresql on
Windows (version 9.3), with the shipping Postgresql JDBC driver 42.1.3.
The test is a simple crawl against a locally-written RSS service.


Karl


On Thu, Mar 8, 2018 at 9:54 AM, Karl Wright <daddywri@gmail.com> wrote:

> I've reviewed all changes to the RSS connector and to the framework over
> the last year, and none of them could reasonably have been expected to have
> any kind of effect like this.  The only things changed were the redirect
> strategy and updating to the latest Postgresql JDBC driver.
>
> If the problem doesn't occur in the single-process example, the next
> question is: do you have a multiprocess setup?  If so, try the multiprocess
> example and see if that succeeds.  If it does, the problem is how we work
> with Postgresql.
>
> Karl
>
>
> On Thu, Mar 8, 2018 at 9:41 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Mike,
>>
>> You are the third person this morning that has reported this in
>> conjunction with Postgresql.  It is possible that some behavior we count on
>> broke in the latest postgresql release.  Can you tell me what version you
>> are using?  Do you see the same behavior when you run with the built-in
>> HSQLDB example?
>>
>> Karl
>>
>>
>> On Thu, Mar 8, 2018 at 9:32 AM, Mike Hugo <mike@piragua.com> wrote:
>>
>>> Hello,
>>>
>>> I set up a new manifold instance based on the simple example.  I
>>> modified properties.xml to point to a postgresql database and then set it
>>> up to read an RSS feed.  It uses a custom output connector to send the data
>>> to a custom API.
>>>
>>> I've noticed that it starts properly, but it only pulls in 3 or 4
>>> records before it "hangs" and doesn't pull in more docs after that.  If I
>>> bounce the server then it will pull in 3 or 4 more docs, but then seems to
>>> hang again.
>>>
>>> I can add a new RSS feed and start it, but it won't pull in any
>>> documents until the server is bounced.
>>>
>>> I increased the value of org.apache.manifoldcf.crawler.threads and that
>>> seems to help, but it just delays the same behavior.  For example, it might
>>> pull in 10 or 15 docs, but then stops pulling them in again.  No messages
>>> in the logs.
>>>
>>> It does appear that it's spawning many many of these threads:
>>> ExecuteQueryThread
>>>
>>> Any ideas where to start looking or how to debug why it hangs after only
>>> a few documents?
>>>
>>> Thanks!!
>>>
>>> Mike
>>>
>>
>>
>

Mime
View raw message