manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Amazon RDS for PostgreSQL Support
Date Sat, 16 Dec 2017 14:08:17 GMT
We should write up your findings in a FAQ if you find MySQL to perform
better than postgresql.

My guess is that what you are seeing is less "bad plan", more "contention
and backoff", but that would take some effort to tease out.

Karl

On Fri, Dec 15, 2017 at 3:06 PM, Karl Wright <daddywri@gmail.com> wrote:

> Interesting.  Thanks for the update.
> Karl
>
> On Fri, Dec 15, 2017 at 2:31 PM, Markus Schuch <
> markus.schuch@deutschebahn.com> wrote:
>
>> We were no able to resolve the performance issue with the carrydown table.
>>
>>
>>
>> We switched to RDS Aurora MySQL, which performs similar to the MariaDB
>> database in our on premise datacenter. No problems with the carrydown table
>> queries of the sharepoint connector.
>>
>>
>>
>> Regards
>>
>> Markus
>>
>>
>>
>> *Von:* Karl Wright [mailto:daddywri@gmail.com]
>> *Gesendet:* Donnerstag, 30. November 2017 12:23
>>
>> *An:* user@manifoldcf.apache.org
>> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>>
>>
>>
>> Typically Jobs table is short and sequential scans are faster than index
>> joins.  Postgres optimizes for that.
>>
>>
>>
>> No, the plans look fine.  Another reason for the long-running queries
>> might well be contention and locking -- many threads will be trying to do
>> similar things at the same time.  You will note that multiple records get
>> updated in one query; this is usually helpful but when each update is
>> expensive you could wind up with locking causing delays.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Thu, Nov 30, 2017 at 2:33 AM, Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>> Hi Karl,
>>
>>
>>
>> we disabled autovacuum
>>
>> The stats table show there was no autovac since then.
>>
>>
>>
>> The long running queries still occur.
>>
>> There are no other apps using the database and no other jobs running.
>>
>>
>>
>> But there is another long running query to the jobs table between the
>> carrydown queries.
>>
>> This query seem to happen at the same time with the long running
>> carrydown query.
>>
>>
>>
>> The plan output says “Sec Scan on jobs…”.
>>
>>
>>
>> 2017-11-30 08:16:16,008 WARN  [Finisher thread] org.apache.manifoldcf.db:
>> Found a long-running query (169057 ms): [SELECT id FROM jobs WHERE
>> (status=? OR status=? OR status=?) FOR UPDATE]
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 0: 'A'
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 1: 'W'
>>
>> 2017-11-30 08:16:16,009 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:   Parameter 2: 'R'
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: LockRows  (cost=0.00..4.43 rows=3
>> width=14) (actual time=0.022..0.024 rows=1 loops=1)
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:   ->  Seq Scan on jobs  (cost=0.00..4.40
>> rows=3 width=14) (actual time=0.021..0.022 rows=1 loops=1)
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:         Filter: ((status = 'A'::bpchar) OR
>> (status = 'W'::bpchar) OR (status = 'R'::bpchar))
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan:         Rows Removed by Filter: 22
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: Planning time: 0.093 ms
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread]
>> org.apache.manifoldcf.db:  Plan: Execution time: 0.041 ms
>>
>> 2017-11-30 08:16:16,011 WARN  [Finisher thread] org.apache.manifoldcf.db:
>>
>>
>>
>> Jobs table indices:
>>
>> public    jobs       jobs_pkey                          CREATE UNIQUE
>> INDEX jobs_pkey ON jobs USING btree (id)
>>
>> public    jobs       i1511464305264                               CREATE
>> INDEX i1511464305264 ON jobs USING btree (status, id, priority)
>>
>> public    jobs       i1511464305263                               CREATE
>> INDEX i1511464305263 ON jobs USING btree (status, processid)
>>
>> public    jobs       i1511464305262                               CREATE
>> INDEX i1511464305262 ON jobs USING btree (connectionname)
>>
>> public    jobs       i1511464305261                               CREATE
>> INDEX i1511464305261 ON jobs USING btree (failtime)
>>
>>
>>
>> Shouldn’t be i1511464305264 be used?
>>
>>
>>
>> Many thanks in advance
>>
>> Markus
>>
>>
>>
>> *Von:* Karl Wright [mailto:daddywri@gmail.com]
>> *Gesendet:* Mittwoch, 29. November 2017 23:28
>> *An:* user@manifoldcf.apache.org
>> *Betreff:* Re: Amazon RDS for PostgreSQL Support
>>
>>
>>
>> The plans look good for the carrydown execution, so I have to conclude
>> that the long-running queries are due to other considerations -- perhaps
>> concurrent vacuuming, perhaps other queries pounding the database.
>>
>>
>>
>> Sharepoint is especially hard on the carrydown table -- it gets huge.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Nov 29, 2017 at 4:46 PM, Markus Schuch <
>> markus.schuch@deutschebahn.com> wrote:
>>
>> Hi,
>>
>>
>>
>> since nobody responded, we started to experiment.
>>
>>
>>
>> Setup:
>>
>> AWS RDS Postgres 9.6.3
>>
>> db.m4.xlarge (4 cores, 16 GB RAM)
>>
>> provisioned iops 4000
>>
>>
>>
>> Indexing Performance for most jobs is pretty good, except a for a bigger
>> sharepoint crawl (~200.000 docs).
>>
>>
>>
>> We are seeing a lot of long running queries for the tables carrydown here.
>>
>>
>>
>> Logfile:
>>
>> https://gist.githubusercontent.com/schuch/b553dc3f825e0c640a
>> 52ce3f530ec3e0/raw/34e493ef520d2272b25f7d9d8acc3f8417550c17/gistfile1.txt
>>
>>
>>
>> Settings:
>>
>> https://gist.githubusercontent.com/schuch/44fcbb312df37ab89a
>> 910024a56ff835/raw/eba6ff9bc82026f181d5687584e9dd99cdffb242/
>> postgresql%2520RDS%2520settings
>>
>> (due to this discussion we left autovac on:
>> https://www.mail-archive.com/user@manifoldcf.apache.org/msg03512.html)
>>
>>
>>
>> CPU Usage of the Database is between 15% and 35%
>>
>>
>>
>> Are we hitting bad auto optimiziations of the newer postgresql version?
>>
>>
>>
>> Many thanks in advance,
>>
>> Markus
>>
>>
>>
>>
>>
>
>

Mime
View raw message