manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Error: Unexpected jobqueue status
Date Wed, 04 Feb 2015 15:21:20 GMT
Thanks for the trace.

Here's what I get out of it:

The shutdown thread is waiting for all the threads to terminate:

>>>>>>
"Shutdown thread" prio=10 tid=0x00007fe740114000 nid=0xfa9 in Object.wait()
[0x00007fe710116000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at
org.apache.manifoldcf.core.system.ManifoldCF.sleep(ManifoldCF.java:1048)
    - locked <0x00000000e70cf2e8> (a java.lang.Integer)
    at
org.apache.manifoldcf.crawler.system.CrawlerAgent.stopSystem(CrawlerAgent.java:617)
    at
org.apache.manifoldcf.crawler.system.CrawlerAgent.stopAgent(CrawlerAgent.java:249)
    at
org.apache.manifoldcf.agents.system.AgentsDaemon.stopAgents(AgentsDaemon.java:168)
    - locked <0x00000000eafa3460> (a java.util.HashMap)
    at
org.apache.manifoldcf.agents.system.AgentsDaemon$AgentsShutdownHook.doCleanup(AgentsDaemon.java:395)
    at
org.apache.manifoldcf.core.system.ManifoldCF.cleanUpEnvironment(ManifoldCF.java:1340)
    - locked <0x00000000eafcb208> (a java.util.ArrayList)
    - locked <0x00000000eafcb2a0> (a java.lang.Integer)
    at
org.apache.manifoldcf.core.system.ManifoldCF$ShutdownThread.run(ManifoldCF.java:1565)
<<<<<<

There are a lot of Zookeeper threads still alive, but those don't matter
here.  There is precisely one thread that is blocking shutdown:

>>>>>>
"Startup thread" daemon prio=10 tid=0x00007fe73012f000 nid=0x340b in
Object.wait() [0x00007fe71f7f6000]
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(Native Method)
    at java.lang.Thread.join(Thread.java:1281)
    - locked <0x00000000e851a0b8> (a
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread)
    at java.lang.Thread.join(Thread.java:1355)
    at
org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.finishUp(Database.java:694)
    at
org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:728)
    at
org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:790)
    at
org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1444)
    at
org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:146)
    at
org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:191)
    at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performModification(DBInterfacePostgreSQL.java:656)
    at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.analyzeTableInternal(DBInterfacePostgreSQL.java:1431)
    at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.noteModificationsNoTransactions(DBInterfacePostgreSQL.java:1576)
    at
org.apache.manifoldcf.core.database.Database.playbackModifications(Database.java:429)
    at
org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:414)
    at
org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1231)
    at
org.apache.manifoldcf.crawler.jobs.JobManager.resetStartupJob(JobManager.java:7575)
    at
org.apache.manifoldcf.crawler.system.StartupThread.run(StartupThread.java:238)
<<<<<<

This is performing a database ANALYZE TABLE operation, which *should* be
interruptible, given the trace, but is apparently *not*.  I'll have to look
at this to find a reason why it won't interrupt.  Offhand, I can see no
reason for it.

When you got this thread dump, how long had the system been waiting to shut
down?

Karl


On Wed, Feb 4, 2015 at 9:04 AM, Adrian Conlon <Adrian.Conlon@arup.com>
wrote:

>  Hi Karl,
>
>
>
> It’s taken me a little while before I needed to do some work on the
> server, but here’s a ‘jstack’ dump of an agents run that’s not responding
> to a stop request.
>
>
>
> Hope it’s helpful,
>
>
>
> Adrian
>
>
>
> *From:* Adrian Conlon [mailto:Adrian.Conlon@arup.com]
> *Sent:* 12 January 2015 15:35
> *To:* user@manifoldcf.apache.org
> *Subject:* RE: Error: Unexpected jobqueue status
>
>
>
> Thanks Karl.  Restarting the job manually did fix the problem.
>
>
>
> I might add a check for this in my software and kick the job into life
> again automatically, now I know it works…
>
>
>
> Adrian
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com <daddywri@gmail.com>]
> *Sent:* 12 January 2015 11:34
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Error: Unexpected jobqueue status
>
>
>
> Hi Adrian,
>
> Just restarting the job should be sufficient to get it sorted out after
> this kind of failure.
>
> Karl
>
>
>
> On Mon, Jan 12, 2015 at 6:19 AM, Adrian Conlon <Adrian.Conlon@arup.com>
> wrote:
>
>  Thanks Karl.
>
>
>
> With regards the runtime environment, apologies for the omission of
> Postgresql version.  It’s v9.3.
>
>
>
> For the stack trace, I’ve just installed a jdk on the problematic server
> and tried out “jstack” (that’s a neat tool!), so I’m all systems go for the
> next time agents process doesn’t respond to a stop request.
>
>
>
> With regards jobs that have this unexpected “jobqueue” status; do they
> sort themselves out the next time the job runs?  Is there anything I should
> do to “help” the job along?
>
>
>
> Adrian
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* 12 January 2015 00:27
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Error: Unexpected jobqueue status
>
>
>
> Also, if you are having trouble shutting down the agents process, it would
> be great if you could get a thread dump and post it, before you kill -9 it.
>
> Karl
>
>
>
> On Sun, Jan 11, 2015 at 7:25 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>  Hi Adrian,
>
> If you noted the comment stream in CONNECTORS-590, I was able to
> demonstrate conclusively that the problem was in Postgres.  I have not seen
> the problem in 9.3, but that does not mean it's gone.  What version of
> Postgresql are you using?
>
> In any case, while this problem definitely terminates your job, it will
> not happen very often.  I suspect the frequency of occurrence may depend on
> how loaded the database is.
>
> Karl
>
>
>
> On Sun, Jan 11, 2015 at 7:14 PM, Adrian Conlon <Adrian.Conlon@arup.com>
> wrote:
>
>  Hi All,
>
>
>
> I’m getting an occurrence of what looks very similar to CONNECTORS-590.
>
>
>
> The circumstances are:
>
>
>
> 1)      MCF Jobs proceeding very slowly (looks like a Postgresql vacuum
> is needed)
>
> 2)      Stop tomcat
>
> 3)      Attempt to stop the agents normally
>
> 4)      Wait a minute or two
>
> 5)      Decide to “kill -9” the agents process
>
> 6)      Vacuum the database
>
> 7)      Restart tomcat
>
> 8)      Restart the agents
>
>
>
> When I checked the job status page, I found that two of the jobs (out
> around 4000 or so) had the following status (or very similar):
>
>
>
> Error: Unexpected jobqueue status - record id 1417115392831, expecting
> active status, saw 4
>
>
>
> Setup-wise, I’m running a release candidate of v1.8 RC (I think RC2),
> using postresql as the crawl database and running on Ubuntu Linux.  I’m
> using zookeeper style synchronisation.
>
>
>
> Let me know if more information etc. is needed or if you think it’s a
> new/real issue.
>
> Adrian
>
> ____________________________________________________________
> Electronic mail messages entering and leaving Arup  business
> systems are scanned for acceptability of content and viruses
>
>
>
>
>
>
>

Mime
View raw message