manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Markus Schuch <markus_sch...@web.de>
Subject Re: Amazon RDS for PostgreSQL Support
Date Sat, 16 Dec 2017 15:21:44 GMT
would this be the right place?
https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ

Am 16.12.2017 um 15:08 schrieb Karl Wright:
> We should write up your findings in a FAQ if you find MySQL to perform
> better than postgresql.
> 
> My guess is that what you are seeing is less "bad plan", more
> "contention and backoff", but that would take some effort to tease out.
> 
> Karl
> 
> On Fri, Dec 15, 2017 at 3:06 PM, Karl Wright <daddywri@gmail.com
> <mailto:daddywri@gmail.com>> wrote:
> 
>     Interesting.  Thanks for the update.
>     Karl
> 
>     On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch
>     <markus.schuch@deutschebahn.com
>     <mailto:markus.schuch@deutschebahn.com>> wrote:
> 
>         We were no able to resolve the performance issue with the
>         carrydown table.____
> 
>         __ __
> 
>         We switched to RDS Aurora MySQL, which performs similar to the
>         MariaDB database in our on premise datacenter. No problems with
>         the carrydown table queries of the sharepoint connector.____
> 
>         __ __
> 
>         Regards____
> 
>         Markus____
> 
>         __ __
> 
>         *Von:* Karl Wright [mailto:daddywri@gmail.com
>         <mailto:daddywri@gmail.com>]
>         *Gesendet:* Donnerstag, 30. November 2017 12:23
> 
> 
>         *An:* user@manifoldcf.apache.org <mailto:user@manifoldcf.apache.org>
>         *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> 
>         __ __
> 
>         Typically Jobs table is short and sequential scans are faster
>         than index joins.  Postgres optimizes for that.____
> 
>         __ __
> 
>         No, the plans look fine.  Another reason for the long-running
>         queries might well be contention and locking -- many threads
>         will be trying to do similar things at the same time.  You will
>         note that multiple records get updated in one query; this is
>         usually helpful but when each update is expensive you could wind
>         up with locking causing delays.____
> 
>         __ __
> 
>         Karl____
> 
>         __ __
> 
>         __ __
> 
>         On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch
>         <markus.schuch@deutschebahn.com
>         <mailto:markus.schuch@deutschebahn.com>> wrote:____
> 
>             Hi Karl,____
> 
>              ____
> 
>             we disabled autovacuum____
> 
>             The stats table show there was no autovac since then.____
> 
>              ____
> 
>             The long running queries still occur.____
> 
>             There are no other apps using the database and no other jobs
>             running.____
> 
>              ____
> 
>             But there is another long running query to the jobs table
>             between the carrydown queries.____
> 
>             This query seem to happen at the same time with the long
>             running carrydown query.____
> 
>              ____
> 
>             The plan output says “Sec Scan on jobs…”. ____
> 
>              ____
> 
>             2017-11-30 08:16:16,008 WARN  [Finisher thread]
>             org.apache.manifoldcf.db: Found a long-running query (169057
>             ms): [SELECT id FROM jobs WHERE  (status=? OR status=? OR
>             status=?) FOR UPDATE]____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 0: 'A'____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 1: 'W'____
> 
>             2017-11-30 08:16:16,009 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:   Parameter 2: 'R'____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43
>             rows=3 width=14) (actual time=0.022..0.024 rows=1 loops=1)____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs 
>             (cost=0.00..4.40 rows=3 width=14) (actual time=0.021..0.022
>             rows=1 loops=1)____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:         Filter: ((status =
>             'A'::bpchar) OR (status = 'W'::bpchar) OR (status =
>             'R'::bpchar))____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan:         Rows Removed by
>             Filter: 22____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms____
> 
>             2017-11-30 08:16:16,011 WARN  [Finisher thread]
>             org.apache.manifoldcf.db:____
> 
>              ____
> 
>             Jobs table indices:____
> 
>             public    jobs       jobs_pkey                         
>             CREATE UNIQUE INDEX jobs_pkey ON jobs USING btree (id)____
> 
>             public    jobs      
>             i1511464305264                              
CREATE INDEX
>             i1511464305264 ON jobs USING btree (status, id, priority)____
> 
>             public    jobs      
>             i1511464305263                              
CREATE INDEX
>             i1511464305263 ON jobs USING btree (status, processid)____
> 
>             public    jobs      
>             i1511464305262                              
CREATE INDEX
>             i1511464305262 ON jobs USING btree (connectionname)____
> 
>             public    jobs      
>             i1511464305261                              
CREATE INDEX
>             i1511464305261 ON jobs USING btree (failtime)____
> 
>              ____
> 
>             Shouldn’t be i1511464305264 be used?____
> 
>              ____
> 
>             Many thanks in advance____
> 
>             Markus____
> 
>              ____
> 
>             *Von:* Karl Wright [mailto:daddywri@gmail.com
>             <mailto:daddywri@gmail.com>]
>             *Gesendet:* Mittwoch, 29. November 2017 23:28
>             *An:* user@manifoldcf.apache.org
>             <mailto:user@manifoldcf.apache.org>
>             *Betreff:* Re: Amazon RDS for PostgreSQL Support____
> 
>              ____
> 
>             The plans look good for the carrydown execution, so I have
>             to conclude that the long-running queries are due to other
>             considerations -- perhaps concurrent vacuuming, perhaps
>             other queries pounding the database.____
> 
>              ____
> 
>             Sharepoint is especially hard on the carrydown table -- it
>             gets huge.____
> 
>              ____
> 
>             Karl____
> 
>              ____
> 
>              ____
> 
>             On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch
>             <markus.schuch@deutschebahn.com
>             <mailto:markus.schuch@deutschebahn.com>> wrote:____
> 
>                 Hi,____
> 
>                  ____
> 
>                 since nobody responded, we started to experiment.____
> 
>                  ____
> 
>                 Setup:____
> 
>                 AWS RDS Postgres 9.6.3____
> 
>                 db.m4.xlarge (4 cores, 16 GB RAM)____
> 
>                 provisioned iops 4000____
> 
>                  ____
> 
>                 Indexing Performance for most jobs is pretty good,
>                 except a for a bigger sharepoint crawl (~200.000 docs).____
> 
>                  ____
> 
>                 We are seeing a lot of long running queries for the
>                 tables carrydown here.____
> 
>                  ____
> 
>                 Logfile:____
> 
>                 https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>                 <https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt>____
> 
>                  ____
> 
>                 Settings:____
> 
>                 https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings
>                 <https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/postgresql%2520RDS%2520settings>____
> 
>                 (due to this discussion we left autovac on:
>                 https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html
>                 <https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html>)____
> 
>                  ____
> 
>                 CPU Usage of the Database is between 15% and 35%____
> 
>                  ____
> 
>                 Are we hitting bad auto optimiziations of the newer
>                 postgresql version?____
> 
>                  ____
> 
>                 Many thanks in advance,____
> 
>                 Markus____
> 
>              ____
> 
>         __ __
> 
> 
> 

Mime
View raw message