manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Standen Guy <Guy.Stan...@uk.fujitsu.com>
Subject Lock Warnings and Long Queries
Date Tue, 31 Jan 2017 15:19:30 GMT
Hi Karl,
I hope you might be able to provide some advice and guidance about a job lock up and performance
issue we are seeing with ManifoldCF 2.0.1 in our environment.
Environment:
We are running ManifoldCF version 2.0.1 on Windows 2008 R2   using a PostgreSQL 9.3 database
We have used the ManifoldCF multiprocess-file-example as the basis of the installation. (As
an aside for a production environment would it be acceptable to use the file-based synchronisation
model?)
We are calling the "start-agents.bat" file from a NSSM service wrapper (could there be an
issue if we do not call the stop-agents.bat file when the service wrapper shuts down.  NSSM
shuts down the start-agents .bat by sending it the equivalent of Ctrl-C).

The database has been set up using the configuration parameter values in the postgresql.conf
parameters identified in https://manifoldcf.apache.org/release/release-2.5/en_US/how-to-build-and-deploy.html
We crawl various repositories including:
-Website with 1.8 million documents (crawled using 16 different jobs - each run manually)
-SharePoint instance with only a few hundred documents (2 or 3 crawl Jobs)   run hourly
-Website crawled every 5 mins ( upto 4 documents returned - 60kb each)
The documents are output to multiple collections in SOLR Cloud 4.10.3

We have been running the PostgreSQL Vacuum Full and Re-index manually (each take an hour to
run) on a semi regular basis and before and after large jobs.
Looking at the PostgreSQL statistics it appears that the Jobqueue table is being analysed
every couple of minutes.

Issues:
Issue 1: Warnings about locks
Regularly we see lock errors in the logs which sometimes result in all the jobs halting until
a lock-clean is performed.
We have checked that all the manifoldcf processes are running as the same user (SYSTEM), 
 and that AV and Windows indexing are not scanning the Syncharea.

The kinds of lock errors we are seeing include:

"WARN 2017-01-19 04:33:40,964 (Idle cleanup thread) - Attempt to set file lock 'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\602\12\lock-_Cache_OUTPUTCONNECTION_Solr
COLL1 osp_unstruct.lock' failed: Access is denied java.io.IOException: Access is denied"

and
WARN 2017-01-19 03:22:33,312 (Idle cleanup thread) - Attempt to set file lock 'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\386\230\lock-_POOLTARGET__TRANSFORMATIONCONNECTORPOOL_Metadata
Adjuster.lock' failed: Access is denied
java.io.IOException: Access is denied

and
WARN 2017-01-19 03:35:39,864 (Idle cleanup thread) - Attempt to set file lock 'D:\Apps\ManifoldCF\apache-manifoldcf-2.0.1\multiprocess-file-example\.\.\syncharea\475\708\lock-_POOLTARGET__OUTPUTCONNECTORPOOL_Solr
COLL1 osp_unstruct.lock' failed: Access is denied
java.io.IOException: Access is denied







Issue 2:  Log contains  Error : Unexpected jobqueue status

"WARN 2017-01-16 13:57:20,954 (Worker thread '7') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 13:59:51,042 (Worker thread '72') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 13:59:51,042 (Worker thread '93') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '82') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '103') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 13:59:51,057 (Worker thread '47') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '74') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '43') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '101') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '109') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '77') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '15') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
WARN 2017-01-16 14:00:04,037 (Worker thread '30') - Service interruption reported for job
1442819500587 connection WebSource1: IO exception reading header: Connection reset
ERROR 2017-01-16 14:01:03,566 (Worker thread '43') - Exception tossed: Unexpected jobqueue
status - record id 1463482497412, expecting active status, saw 4
org.apache.manifoldcf.core.interfaces.ManifoldCFException: Unexpected jobqueue status - record
id 1463482497412, expecting active status, saw 4
at org.apache.manifoldcf.crawler.jobs.JobQueue.updateCompletedRecord(JobQueue.java:1019)
                at org.apache.manifoldcf.crawler.jobs.JobManager.markDocumentCompletedMultiple(JobManager.java:3118)
at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:661)


Issue 3: Various warnings about long running queries
Example 1
WARN 2017-01-16 14:08:35,670 (Startup thread) - Found a long-running query (98326 ms): [UPDATE
hopcount SET distance=?,deathmark=? WHERE id IN(SELECT t0.ownerid FROM hopdeletedeps t0,jobqueue
t99,intrinsiclink t1 WHERE t0.jobid=? AND t1.jobid=? AND t1.parentidhash=t0.parentidhash AND
t1.linktype=t0.linktype AND t1.childidhash=t0.childidhash AND t99.jobid=? AND t99.dochash=t0.childidhash
AND  (t99.status=? OR t99.status=?))]
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 0: '-1'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 1: 'D'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 2: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 3: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 4: '1442819500587'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 5: 'P'
WARN 2017-01-16 14:08:35,670 (Startup thread) -   Parameter 6: 'H'
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan: Update on hopcount  (cost=7229.16..7236.97
rows=1 width=87)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:   ->  Nested Loop  (cost=7229.16..7236.97
rows=1 width=87)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:         ->  HashAggregate  (cost=7228.73..7228.74
rows=1 width=26)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:               ->  Nested Loop  (cost=9.99..7228.72
rows=1 width=26)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                     ->  Nested Loop
 (cost=9.43..7220.35 rows=1 width=148)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                           ->  Bitmap
Heap Scan on jobqueue t99  (cost=8.87..12.89 rows=1 width=47)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                 Recheck
Cond: ((status = 'P'::bpchar) OR (status = 'H'::bpchar))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                 Filter:
(jobid = 1442819500587::bigint)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                 ->
 BitmapOr  (cost=8.87..8.87 rows=1 width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                   
   ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                   
         Index Cond: (status = 'P'::bpchar)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                   
   ->  Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 width=0)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                   
         Index Cond: (status = 'H'::bpchar)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                           ->  Index
Scan using i1437569161250 on hopdeletedeps t0  (cost=0.56..7188.01 rows=1945 width=101)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                                 Index
Cond: ((jobid = 1442819500587::bigint) AND ((childidhash)::text = (t99.dochash)::text))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                     ->  Index Scan
using i1437569161252 on intrinsiclink t1  (cost=0.55..8.36 rows=1 width=93)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:                           Index Cond:
((jobid = 1442819500587::bigint) AND ((parentidhash)::text = (t0.parentidhash)::text) AND
((linktype)::text = (t0.linktype)::text) AND ((childidhash)::text = (t0.childidhash)::text))
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:         ->  Index Scan using hopcount_pkey
on hopcount  (cost=0.43..8.22 rows=1 width=69)
WARN 2017-01-16 14:08:35,702 (Startup thread) -  Plan:               Index Cond: (id = t0.ownerid)
WARN 2017-01-16 14:08:35,702 (Startup thread) -
 WARN 2017-01-16 14:08:36,279 (Startup thread) -  Stats: n_distinct=3.0 most_common_vals={C,G,Z}
most_common_freqs={0.97649997,0.023066666,0.00043333333}
WARN 2017-01-16 14:08:36,279 (Startup thread) -

Example 2
WARN 2017-01-18 03:49:17,802 (Worker thread '73') - Found a long-running query (157622 ms):
[SELECT t0.id,t0.dochash,t0.docid FROM carrydown t1, jobqueue t0 WHERE t1.jobid=? AND t1.parentidhash=?
AND t0.dochash=t1.childidhash AND t0.jobid=t1.jobid AND t1.isnew=?]
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 0: '1443087309509'
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 1: '7397E59F70C5FA473E178890A7A96C871833F621'
WARN 2017-01-18 03:49:17,802 (Worker thread '73') -   Parameter 2: 'B'
WARN 2017-01-18 03:49:17,865 (Worker thread '73') -  Plan: Nested Loop  (cost=0.82..14.62
rows=1 width=126) (actual time=0.020..0.020 rows=0 loops=1)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:   ->  Index Scan using i1437569161245
on carrydown t1  (cost=0.27..6.04 rows=1 width=49) (actual time=0.007..0.007 rows=0 loops=1)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:         Index Cond: ((jobid = 1443087309509::bigint)
AND ((parentidhash)::text = '7397E59F70C5FA473E178890A7A96C871833F621'::text))
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:         Filter: (isnew = 'B'::bpchar)
WARN 2017-01-18 03:49:17,896 (Worker thread '73') -  Plan:   ->  Index Scan using i1437569161255
on jobqueue t0  (cost=0.55..8.57 rows=1 width=134) (never executed)


Example 3:
WARN 2017-01-18 04:37:26,435 (Expire stuffer thread) - Found a long-running query (157638
ms): [SELECT t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount FROM jobqueue
t0 WHERE  (t0.status=? OR t0.status=?) AND t0.checkaction=? AND t0.checktime<=? AND EXISTS(SELECT
'x' FROM jobs t1 WHERE  (t1.status=? OR t1.status=?) AND t1.id=t0.jobid) AND NOT EXISTS(SELECT
'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND  (t2.status=? OR t2.status=? OR t2.status=?
OR t2.status=? OR t2.status=? OR t2.status=?) AND t2.jobid!=t0.jobid) LIMIT 100]
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 0: 'P'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 1: 'G'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 2: 'D'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 3: '1484714088797'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 4: 'A'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 5: 'a'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 6: 'A'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 7: 'F'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 8: 'a'
WARN 2017-01-18 04:37:26,451 (Expire stuffer thread) -   Parameter 9: 'f'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -   Parameter 10: 'D'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -   Parameter 11: 'd'
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan: Limit  (cost=17.72..45.87 rows=1
width=152) (actual time=0.043..0.043 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:   ->  Nested Loop Anti Join
 (cost=17.72..45.87 rows=1 width=152) (actual time=0.040..0.040 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:         ->  Nested Loop Semi
Join  (cost=13.16..25.21 rows=1 width=152) (actual time=0.038..0.038 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  Bitmap
Heap Scan on jobqueue t0  (cost=8.88..12.90 rows=1 width=152) (actual time=0.036..0.036 rows=0
loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     Recheck
Cond: (((status = 'P'::bpchar) AND (checkaction = 'D'::bpchar) AND (checktime <= 1484714088797::bigint))
OR ((status = 'G'::bpchar) AND (checkaction = 'D'::bpchar) AND (checktime <= 1484714088797::bigint)))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     ->  BitmapOr
 (cost=8.88..8.88 rows=1 width=0) (actual time=0.033..0.033 rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                           ->
 Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 width=0) (actual time=0.017..0.017
rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                            
    Index Cond: ((status = 'P'::bpchar) AND (checkaction = 'D'::bpchar) AND (checktime <=
1484714088797::bigint))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                           ->
 Bitmap Index Scan on i1437569161257  (cost=0.00..4.44 rows=1 width=0) (actual time=0.012..0.012
rows=0 loops=1)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                            
    Index Cond: ((status = 'G'::bpchar) AND (checkaction = 'D'::bpchar) AND (checktime <=
1484714088797::bigint))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  Bitmap
Heap Scan on jobs t1  (cost=4.28..8.29 rows=1 width=8) (never executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     Recheck
Cond: (id = t0.jobid)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     Filter:
((status = 'A'::bpchar) OR (status = 'a'::bpchar))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     ->  Bitmap
Index Scan on jobs_pkey  (cost=0.00..4.28 rows=1 width=0) (never executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                           Index
Cond: (id = t0.jobid)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:         ->  Bitmap Heap Scan
on jobqueue t2  (cost=4.57..12.61 rows=1 width=49) (never executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               Recheck Cond:
((dochash)::text = (t0.dochash)::text)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               Filter: ((jobid
<> t0.jobid) AND ((status = 'A'::bpchar) OR (status = 'F'::bpchar) OR (status = 'a'::bpchar)
OR (status = 'f'::bpchar) OR (status = 'D'::bpchar) OR (status = 'd'::bpchar)))
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:               ->  Bitmap
Index Scan on i1437569161255  (cost=0.00..4.57 rows=2 width=0) (never executed)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan:                     Index Cond:
((dochash)::text = (t0.dochash)::text)
WARN 2017-01-18 04:37:26,466 (Expire stuffer thread) -  Plan: Total runtime: 0.151 ms

Best Regards,

Guy

Guy Standen



Unless otherwise stated, this email has been sent from Fujitsu Services Limited (registered
in England No 96056); Fujitsu EMEA PLC (registered in England No 2216100) both with registered
offices at: 22 Baker Street, London W1U 3BW;  PFU (EMEA) Limited, (registered in England No
1578652) and Fujitsu Laboratories of Europe Limited (registered in England No. 4153469) both
with registered offices at: Hayes Park Central, Hayes End Road, Hayes, Middlesex, UB4 8FE.

This email is only for the use of its intended recipient. Its contents are subject to a duty
of confidence and may be privileged. Fujitsu does not guarantee that this email has not been
intercepted and amended or that it is virus-free.
Mime
View raw message