manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Timeout problems with web crawling
Date Wed, 24 Apr 2013 15:01:47 GMT
Hi Erlend,

The lock-up seems to be due to the throttling code getting stuck.  This
code has been largely unchanged for many versions, so I think it is likely
to happen because of changes in the jvm's optimization algorithms.  I've
checked in a fix which should hopefully stop that from happening to trunk.
See CONNECTORS-679.

Karl


On Wed, Apr 24, 2013 at 5:29 AM, Karl Wright <daddywri@gmail.com> wrote:

> I updated the ticket as well; all the EXPLAIN queries are very fast at the
> moment.  The reason is that the checktime value you used needs to be the
> current time or larger, which is why I picked
>
> 1466628406182
>
> Karl
>
>
>
> On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen <e.f.garasen@usit.uio.no>wrote:
>
>>
>> Thanks Karl!
>>
>> I have updated the ticket with the information you requested, but there
>> is an email delay at the moment from Jira (or this list).
>>
>> Erlend
>>
>>
>> On 24.04.13 01.37, Karl Wright wrote:
>>
>>> Hi Erlend,
>>>
>>> I had a conversation with the PostgreSQL people.  Have a look at ticket
>>> CONNECTORS-678; you'll want to get an EXPLAIN ANALYZE of the query I
>>> provided before on the live system - but update the checktime value so
>>> it catches more than zero records:
>>>
>>> EXPLAIN ANALYZE SELECT t0.id <http://t0.id>,t0.jobid,t0.**dochash,t0.
>>>
>>> docid,t0.status,t0.failtime,**t0.failcount,t0.priorityset FROM jobqueue
>>> t0  WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
>>> t0.checktime<=1566628406182 AND
>>>      EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND
>>> t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND
>>>
>>> NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND
>>> t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND
>>>      NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id
>>> <http://t0.id>=t3.owner AND t3.eventname=t4.name <http://t4.name>)
ORDER
>>>
>>> BY t0.docpriority ASC LIMIT 100;
>>>
>>> I'd also like to see what the contents of your prereqevents table looks
>>> like on that system.  If you can do SELECT * from prereqevents; , and
>>> give me the first ten or so that  pop up, I can take it from there.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright <daddywri@gmail.com
>>> <mailto:daddywri@gmail.com>> wrote:
>>>
>>>     Hi Erlend,
>>>
>>>     If you have access to psql and can connect to the production
>>>     database, I'd like to explore whether the relatively recent change I
>>>     made for MySQL is in fact breaking PostgreSQL stuffer thread queries.
>>>
>>>     First, read this page:
>>>     http://www.postgresql.org/**docs/current/static/indexes-**
>>> ordering.html<http://www.postgresql.org/docs/current/static/indexes-ordering.html>
>>>
>>>     What we're trying to achieve is that the planner use the index whose
>>>     first column is docpriority.  The way we will need to determine that
>>>     is to open up psql, and "explain" the query in question, and then
>>>     modify the query to see if it begins to plan correctly.
>>>
>>>     In psql, you will need to assess the query in question.  To do that,
>>>     execute the following:
>>>
>>>     EXPLAIN SELECT t0.id
>>>     <http://t0.id>,t0.jobid,t0.**dochash,t0.docid,t0.status,t0.**
>>> failtime,t0.failcount,t0.**priorityset
>>>
>>>     FROM jobqueue t0  WHERE t0.status IN ('P','G') AND
>>>     t0.checkaction='R' AND t0.checktime<=1366628406182 AND
>>>          EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND
>>>     t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND
>>>
>>>          NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>>>     t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
>>>     t2.jobid!=t0.jobid) AND
>>>          NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
>>>     t0.id <http://t0.id>=t3.owner AND t3.eventname=t4.name
>>>     <http://t4.name>) ORDER BY t0.docpriority ASC LIMIT 100;
>>>
>>>
>>>
>>>     The production job MUST be running or it is a meaningless exercise.
>>>
>>>     What you want to see is that the outermost action is NOT a sort;
>>>     instead, you want an
>>>
>>>     Index Scan using i1362584563121
>>>
>>>
>>>     If you get something else, try changing the ORDER BY clause to
>>> includeNULLS LAST orNULLS FIRST to see if that
>>>
>>>
>>>     changes anything.  If not, I may ask you to execute a sample query
>>> or two and tell me what it returns, so I can assess
>>>
>>>       whether the data is messed up in some way.
>>>
>>>
>>>
>>>     Karl
>>>
>>>
>>>
>>>
>>>
>>>
>>>     On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen
>>>     <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.**no<e.f.garasen@usit.uio.no>>>
>>> wrote:
>>>
>>>         On 23.04.13 15.30, Karl Wright wrote:
>>>
>>>             What version of Postgresql is this?  Given the same
>>>             Postgresql version,
>>>
>>>
>>>         9.1.9
>>>
>>>         "allow_system_table_mods";"__**off"
>>>
>>>         "application_name";"pgAdmin III - Query Tool"
>>>         "archive_command";"/local/opt/**__pgsql-9.1/bin/pgclarchive_**
>>> wal.__sh
>>>
>>>         -P %p -F %f -S dbpg-search"
>>>         "archive_mode";"on"
>>>         "archive_timeout";"0"
>>>         "array_nulls";"on"
>>>         "authentication_timeout";"__**1min"
>>>         "autovacuum";"on"
>>>         "autovacuum_analyze_scale___**factor";"0.1"
>>>         "autovacuum_analyze_threshold"**__;"50"
>>>         "autovacuum_freeze_max_age";"_**_200000000"
>>>         "autovacuum_max_workers";"3"
>>>         "autovacuum_naptime";"1min"
>>>         "autovacuum_vacuum_cost_delay"**__;"20ms"
>>>         "autovacuum_vacuum_cost_limit"**__;"-1"
>>>         "autovacuum_vacuum_scale___**factor";"0.2"
>>>         "autovacuum_vacuum_threshold";**__"50"
>>>         "backslash_quote";"safe___**encoding"
>>>
>>>         "bgwriter_delay";"200ms"
>>>         "bgwriter_lru_maxpages";"100"
>>>         "bgwriter_lru_multiplier";"2"
>>>         "block_size";"8192"
>>>         "bonjour";"off"
>>>         "bonjour_name";""
>>>         "bytea_output";"escape"
>>>         "check_function_bodies";"on"
>>>         "checkpoint_completion_target"**__;"0.5"
>>>
>>>         "checkpoint_segments";"128"
>>>         "checkpoint_timeout";"5min"
>>>         "checkpoint_warning";"30s"
>>>         "client_encoding";"UNICODE"
>>>         "client_min_messages";"notice"
>>>         "commit_delay";"0"
>>>         "commit_siblings";"5"
>>>         "constraint_exclusion";"__**partition"
>>>
>>>         "cpu_index_tuple_cost";"0.005"
>>>         "cpu_operator_cost";"0.0025"
>>>         "cpu_tuple_cost";"0.01"
>>>         "cursor_tuple_fraction";"0.1"
>>>         "custom_variable_classes";""
>>>         "DateStyle";"ISO, MDY"
>>>         "db_user_namespace";"off"
>>>         "deadlock_timeout";"1s"
>>>         "debug_assertions";"off"
>>>         "debug_pretty_print";"on"
>>>         "debug_print_parse";"off"
>>>         "debug_print_plan";"off"
>>>         "debug_print_rewritten";"off"
>>>         "default_statistics_target";"_**_100"
>>>         "default_tablespace";""
>>>         "default_text_search_config";"**__pg_catalog.english"
>>>         "default_transaction___**deferrable";"off"
>>>         "default_transaction___**isolation";"read committed"
>>>         "default_transaction_read___**only";"off"
>>>         "default_with_oids";"off"
>>>         "effective_cache_size";"__**16093MB"
>>>
>>>         "effective_io_concurrency";"1"
>>>         "enable_bitmapscan";"on"
>>>         "enable_hashagg";"on"
>>>         "enable_hashjoin";"on"
>>>         "enable_indexscan";"on"
>>>         "enable_material";"on"
>>>         "enable_mergejoin";"on"
>>>         "enable_nestloop";"on"
>>>         "enable_seqscan";"on"
>>>         "enable_sort";"on"
>>>         "enable_tidscan";"on"
>>>         "escape_string_warning";"on"
>>>         "exit_on_error";"off"
>>>         "extra_float_digits";"0"
>>>         "from_collapse_limit";"8"
>>>         "fsync";"on"
>>>         "full_page_writes";"on"
>>>         "geqo";"on"
>>>         "geqo_effort";"5"
>>>         "geqo_generations";"0"
>>>         "geqo_pool_size";"0"
>>>         "geqo_seed";"0"
>>>         "geqo_selection_bias";"2"
>>>         "geqo_threshold";"12"
>>>         "gin_fuzzy_search_limit";"0"
>>>         "hot_standby";"off"
>>>         "hot_standby_feedback";"off"
>>>         "ignore_system_indexes";"off"
>>>         "integer_datetimes";"on"
>>>         "IntervalStyle";"postgres"
>>>         "join_collapse_limit";"8"
>>>         "krb_caseins_users";"off"
>>>         "krb_srvname";"postgres"
>>>         "lc_collate";"C"
>>>         "lc_ctype";"C"
>>>         "lc_messages";"C"
>>>         "lc_monetary";"C"
>>>         "lc_numeric";"C"
>>>         "lc_time";"C"
>>>         "listen_addresses";"dbpg-__**search.uio.no<http://dbpg-__search.uio.no>
>>>         <http://dbpg-search.uio.no>"
>>>         "lo_compat_privileges";"off"
>>>         "local_preload_libraries";""
>>>         "log_autovacuum_min_duration";**__"-1"
>>>
>>>         "log_checkpoints";"off"
>>>         "log_connections";"off"
>>>         "log_destination";"stderr"
>>>         "log_disconnections";"off"
>>>         "log_duration";"off"
>>>         "log_error_verbosity";"__**default"
>>>
>>>         "log_executor_stats";"off"
>>>         "log_file_mode";"0600"
>>>         "log_hostname";"off"
>>>         "log_line_prefix";"[%m] [dbpg-search] [%d:%u] [%r] [%p] [%x] "
>>>         "log_lock_waits";"off"
>>>         "log_min_duration_statement";"**__-1"
>>>         "log_min_error_statement";"__**info"
>>>
>>>         "log_min_messages";"notice"
>>>         "log_parser_stats";"off"
>>>         "log_planner_stats";"off"
>>>         "log_rotation_age";"0"
>>>         "log_rotation_size";"0"
>>>         "log_statement";"none"
>>>         "log_statement_stats";"off"
>>>         "log_temp_files";"-1"
>>>         "log_timezone";"Europe/Oslo"
>>>         "log_truncate_on_rotation";"__**on"
>>>
>>>         "logging_collector";"on"
>>>         "maintenance_work_mem";"128MB"
>>>         "max_connections";"600"
>>>         "max_files_per_process";"1000"
>>>         "max_function_args";"100"
>>>         "max_identifier_length";"63"
>>>         "max_index_keys";"32"
>>>         "max_locks_per_transaction";"_**_64"
>>>         "max_pred_locks_per___**transaction";"64"
>>>         "max_prepared_transactions";"_**_0"
>>>         "max_stack_depth";"4MB"
>>>         "max_standby_archive_delay";"_**_30s"
>>>         "max_standby_streaming_delay";**__"30s"
>>>
>>>         "max_wal_senders";"0"
>>>         "password_encryption";"on"
>>>         "port";"5432"
>>>         "post_auth_delay";"0"
>>>         "pre_auth_delay";"0"
>>>         "quote_all_identifiers";"off"
>>>         "random_page_cost";"2"
>>>         "replication_timeout";"1min"
>>>         "restart_after_crash";"on"
>>>         "search_path";""$user",public"
>>>         "segment_size";"1GB"
>>>         "seq_page_cost";"1"
>>>         "server_encoding";"UTF8"
>>>         "server_version";"9.1.9"
>>>         "server_version_num";"90109"
>>>         "session_replication_role";"__**origin"
>>>
>>>         "shared_buffers";"384MB"
>>>         "silent_mode";"off"
>>>         "sql_inheritance";"on"
>>>         "ssl";"on"
>>>         "ssl_renegotiation_limit";"__**512MB"
>>>         "standard_conforming_strings";**__"off"
>>>         "statement_timeout";"0"
>>>         "superuser_reserved___**connections";"3"
>>>
>>>         "synchronize_seqscans";"on"
>>>         "synchronous_commit";"on"
>>>         "synchronous_standby_names";""
>>>         "syslog_facility";"local0"
>>>         "syslog_ident";"postgres"
>>>         "tcp_keepalives_count";"9"
>>>         "tcp_keepalives_idle";"7200"
>>>         "tcp_keepalives_interval";"75"
>>>         "temp_buffers";"8MB"
>>>         "temp_tablespaces";""
>>>         "TimeZone";"Europe/Oslo"
>>>         "timezone_abbreviations";"__**Default"
>>>         "trace_notify";"off"
>>>         "trace_recovery_messages";"__**log"
>>>         "trace_sort";"off"
>>>         "track_activities";"on"
>>>         "track_activity_query_size";"_**_1024"
>>>
>>>         "track_counts";"on"
>>>         "track_functions";"none"
>>>         "transaction_deferrable";"off"
>>>         "transaction_isolation";"read committed"
>>>         "transaction_read_only";"off"
>>>         "transform_null_equals";"off"
>>>         "unix_socket_group";""
>>>         "unix_socket_permissions";"__**0700"
>>>
>>>         "update_process_title";"on"
>>>         "vacuum_cost_delay";"0"
>>>         "vacuum_cost_limit";"200"
>>>         "vacuum_cost_page_dirty";"20"
>>>         "vacuum_cost_page_hit";"1"
>>>         "vacuum_cost_page_miss";"10"
>>>         "vacuum_defer_cleanup_age";"0"
>>>         "vacuum_freeze_min_age";"__**50000000"
>>>         "vacuum_freeze_table_age";"__**150000000"
>>>
>>>         "wal_block_size";"8192"
>>>         "wal_buffers";"12MB"
>>>         "wal_keep_segments";"0"
>>>         "wal_level";"archive"
>>>         "wal_receiver_status_interval"**__;"10s"
>>>
>>>         "wal_segment_size";"16MB"
>>>         "wal_sender_delay";"1s"
>>>         "wal_sync_method";"fdatasync"
>>>         "wal_writer_delay";"200ms"
>>>         "work_mem";"16MB"
>>>         "xmlbinary";"base64"
>>>         "xmloption";"content"
>>>         "zero_damaged_pages";"off"
>>>
>>>
>>>
>>>         --
>>>         Erlend Garåsen
>>>         Center for Information Technology Services
>>>         University of Oslo
>>>         P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>>>         Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax: (+47)
>>>         22852970 <tel:%28%2B47%29%2022852970>, Mobile: (+47) 91380968
>>>         <tel:%28%2B47%29%2091380968>, VIP: 31050
>>>
>>>
>>>
>>>
>>
>> --
>> Erlend Garåsen
>> Center for Information Technology Services
>> University of Oslo
>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
>> 31050
>>
>
>

Mime
View raw message