I created a test here that exercises the throttling code thoroughly.  It does not hang with the fix.  I have not confirmed that it *does* hang without it, though.

However, I feel reasonably certain that the underlying cause of this issue is in fact your JVM.  If the change doesn't help, and if you are not running the latest version of whatever JVM series you've adopted, it may be wise to update to the latest version.

Karl



On Wed, Apr 24, 2013 at 11:01 AM, Karl Wright <daddywri@gmail.com> wrote:
Hi Erlend,

The lock-up seems to be due to the throttling code getting stuck.  This code has been largely unchanged for many versions, so I think it is likely to happen because of changes in the jvm's optimization algorithms.  I've checked in a fix which should hopefully stop that from happening to trunk.  See CONNECTORS-679.

Karl


On Wed, Apr 24, 2013 at 5:29 AM, Karl Wright <daddywri@gmail.com> wrote:
I updated the ticket as well; all the EXPLAIN queries are very fast at the moment.  The reason is that the checktime value you used needs to be the current time or larger, which is why I picked
1466628406182

Karl


On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen <e.f.garasen@usit.uio.no> wrote:

Thanks Karl!

I have updated the ticket with the information you requested, but there is an email delay at the moment from Jira (or this list).

Erlend


On 24.04.13 01.37, Karl Wright wrote:
Hi Erlend,

I had a conversation with the PostgreSQL people.  Have a look at ticket
CONNECTORS-678; you'll want to get an EXPLAIN ANALYZE of the query I
provided before on the live system - but update the checktime value so
it catches more than zero records:

EXPLAIN ANALYZE SELECT t0.id <http://t0.id>,t0.jobid,t0.dochash,t0.

docid,t0.status,t0.failtime,t0.failcount,t0.priorityset FROM jobqueue
t0  WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
t0.checktime<=1566628406182 AND
     EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND
t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND

NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND
t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND
     NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id
<http://t0.id>=t3.owner AND t3.eventname=t4.name <http://t4.name>) ORDER

BY t0.docpriority ASC LIMIT 100;

I'd also like to see what the contents of your prereqevents table looks
like on that system.  If you can do SELECT * from prereqevents; , and
give me the first ten or so that  pop up, I can take it from there.

Karl



On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright <daddywri@gmail.com
<mailto:daddywri@gmail.com>> wrote:

    Hi Erlend,

    If you have access to psql and can connect to the production
    database, I'd like to explore whether the relatively recent change I
    made for MySQL is in fact breaking PostgreSQL stuffer thread queries.

    First, read this page:
    http://www.postgresql.org/docs/current/static/indexes-ordering.html

    What we're trying to achieve is that the planner use the index whose
    first column is docpriority.  The way we will need to determine that
    is to open up psql, and "explain" the query in question, and then
    modify the query to see if it begins to plan correctly.

    In psql, you will need to assess the query in question.  To do that,
    execute the following:

    EXPLAIN SELECT t0.id
    <http://t0.id>,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset

    FROM jobqueue t0  WHERE t0.status IN ('P','G') AND
    t0.checkaction='R' AND t0.checktime<=1366628406182 AND
         EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND
    t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND

         NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
    t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND
    t2.jobid!=t0.jobid) AND
         NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
    t0.id <http://t0.id>=t3.owner AND t3.eventname=t4.name
    <http://t4.name>) ORDER BY t0.docpriority ASC LIMIT 100;



    The production job MUST be running or it is a meaningless exercise.

    What you want to see is that the outermost action is NOT a sort;
    instead, you want an

    Index Scan using i1362584563121


    If you get something else, try changing the ORDER BY clause to includeNULLS LAST orNULLS FIRST to see if that


    changes anything.  If not, I may ask you to execute a sample query or two and tell me what it returns, so I can assess

      whether the data is messed up in some way.



    Karl






    On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen
    <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.no>> wrote:

        On 23.04.13 15.30, Karl Wright wrote:

            What version of Postgresql is this?  Given the same
            Postgresql version,


        9.1.9

        "allow_system_table_mods";"__off"

        "application_name";"pgAdmin III - Query Tool"
        "archive_command";"/local/opt/__pgsql-9.1/bin/pgclarchive_wal.__sh

        -P %p -F %f -S dbpg-search"
        "archive_mode";"on"
        "archive_timeout";"0"
        "array_nulls";"on"
        "authentication_timeout";"__1min"
        "autovacuum";"on"
        "autovacuum_analyze_scale___factor";"0.1"
        "autovacuum_analyze_threshold"__;"50"
        "autovacuum_freeze_max_age";"__200000000"
        "autovacuum_max_workers";"3"
        "autovacuum_naptime";"1min"
        "autovacuum_vacuum_cost_delay"__;"20ms"
        "autovacuum_vacuum_cost_limit"__;"-1"
        "autovacuum_vacuum_scale___factor";"0.2"
        "autovacuum_vacuum_threshold";__"50"
        "backslash_quote";"safe___encoding"

        "bgwriter_delay";"200ms"
        "bgwriter_lru_maxpages";"100"
        "bgwriter_lru_multiplier";"2"
        "block_size";"8192"
        "bonjour";"off"
        "bonjour_name";""
        "bytea_output";"escape"
        "check_function_bodies";"on"
        "checkpoint_completion_target"__;"0.5"

        "checkpoint_segments";"128"
        "checkpoint_timeout";"5min"
        "checkpoint_warning";"30s"
        "client_encoding";"UNICODE"
        "client_min_messages";"notice"
        "commit_delay";"0"
        "commit_siblings";"5"
        "constraint_exclusion";"__partition"

        "cpu_index_tuple_cost";"0.005"
        "cpu_operator_cost";"0.0025"
        "cpu_tuple_cost";"0.01"
        "cursor_tuple_fraction";"0.1"
        "custom_variable_classes";""
        "DateStyle";"ISO, MDY"
        "db_user_namespace";"off"
        "deadlock_timeout";"1s"
        "debug_assertions";"off"
        "debug_pretty_print";"on"
        "debug_print_parse";"off"
        "debug_print_plan";"off"
        "debug_print_rewritten";"off"
        "default_statistics_target";"__100"
        "default_tablespace";""
        "default_text_search_config";"__pg_catalog.english"
        "default_transaction___deferrable";"off"
        "default_transaction___isolation";"read committed"
        "default_transaction_read___only";"off"
        "default_with_oids";"off"
        "effective_cache_size";"__16093MB"

        "effective_io_concurrency";"1"
        "enable_bitmapscan";"on"
        "enable_hashagg";"on"
        "enable_hashjoin";"on"
        "enable_indexscan";"on"
        "enable_material";"on"
        "enable_mergejoin";"on"
        "enable_nestloop";"on"
        "enable_seqscan";"on"
        "enable_sort";"on"
        "enable_tidscan";"on"
        "escape_string_warning";"on"
        "exit_on_error";"off"
        "extra_float_digits";"0"
        "from_collapse_limit";"8"
        "fsync";"on"
        "full_page_writes";"on"
        "geqo";"on"
        "geqo_effort";"5"
        "geqo_generations";"0"
        "geqo_pool_size";"0"
        "geqo_seed";"0"
        "geqo_selection_bias";"2"
        "geqo_threshold";"12"
        "gin_fuzzy_search_limit";"0"
        "hot_standby";"off"
        "hot_standby_feedback";"off"
        "ignore_system_indexes";"off"
        "integer_datetimes";"on"
        "IntervalStyle";"postgres"
        "join_collapse_limit";"8"
        "krb_caseins_users";"off"
        "krb_srvname";"postgres"
        "lc_collate";"C"
        "lc_ctype";"C"
        "lc_messages";"C"
        "lc_monetary";"C"
        "lc_numeric";"C"
        "lc_time";"C"
        "listen_addresses";"dbpg-__search.uio.no
        <http://dbpg-search.uio.no>"
        "lo_compat_privileges";"off"
        "local_preload_libraries";""
        "log_autovacuum_min_duration";__"-1"

        "log_checkpoints";"off"
        "log_connections";"off"
        "log_destination";"stderr"
        "log_disconnections";"off"
        "log_duration";"off"
        "log_error_verbosity";"__default"

        "log_executor_stats";"off"
        "log_file_mode";"0600"
        "log_hostname";"off"
        "log_line_prefix";"[%m] [dbpg-search] [%d:%u] [%r] [%p] [%x] "
        "log_lock_waits";"off"
        "log_min_duration_statement";"__-1"
        "log_min_error_statement";"__info"

        "log_min_messages";"notice"
        "log_parser_stats";"off"
        "log_planner_stats";"off"
        "log_rotation_age";"0"
        "log_rotation_size";"0"
        "log_statement";"none"
        "log_statement_stats";"off"
        "log_temp_files";"-1"
        "log_timezone";"Europe/Oslo"
        "log_truncate_on_rotation";"__on"

        "logging_collector";"on"
        "maintenance_work_mem";"128MB"
        "max_connections";"600"
        "max_files_per_process";"1000"
        "max_function_args";"100"
        "max_identifier_length";"63"
        "max_index_keys";"32"
        "max_locks_per_transaction";"__64"
        "max_pred_locks_per___transaction";"64"
        "max_prepared_transactions";"__0"
        "max_stack_depth";"4MB"
        "max_standby_archive_delay";"__30s"
        "max_standby_streaming_delay";__"30s"

        "max_wal_senders";"0"
        "password_encryption";"on"
        "port";"5432"
        "post_auth_delay";"0"
        "pre_auth_delay";"0"
        "quote_all_identifiers";"off"
        "random_page_cost";"2"
        "replication_timeout";"1min"
        "restart_after_crash";"on"
        "search_path";""$user",public"
        "segment_size";"1GB"
        "seq_page_cost";"1"
        "server_encoding";"UTF8"
        "server_version";"9.1.9"
        "server_version_num";"90109"
        "session_replication_role";"__origin"

        "shared_buffers";"384MB"
        "silent_mode";"off"
        "sql_inheritance";"on"
        "ssl";"on"
        "ssl_renegotiation_limit";"__512MB"
        "standard_conforming_strings";__"off"
        "statement_timeout";"0"
        "superuser_reserved___connections";"3"

        "synchronize_seqscans";"on"
        "synchronous_commit";"on"
        "synchronous_standby_names";""
        "syslog_facility";"local0"
        "syslog_ident";"postgres"
        "tcp_keepalives_count";"9"
        "tcp_keepalives_idle";"7200"
        "tcp_keepalives_interval";"75"
        "temp_buffers";"8MB"
        "temp_tablespaces";""
        "TimeZone";"Europe/Oslo"
        "timezone_abbreviations";"__Default"
        "trace_notify";"off"
        "trace_recovery_messages";"__log"
        "trace_sort";"off"
        "track_activities";"on"
        "track_activity_query_size";"__1024"

        "track_counts";"on"
        "track_functions";"none"
        "transaction_deferrable";"off"
        "transaction_isolation";"read committed"
        "transaction_read_only";"off"
        "transform_null_equals";"off"
        "unix_socket_group";""
        "unix_socket_permissions";"__0700"

        "update_process_title";"on"
        "vacuum_cost_delay";"0"
        "vacuum_cost_limit";"200"
        "vacuum_cost_page_dirty";"20"
        "vacuum_cost_page_hit";"1"
        "vacuum_cost_page_miss";"10"
        "vacuum_defer_cleanup_age";"0"
        "vacuum_freeze_min_age";"__50000000"
        "vacuum_freeze_table_age";"__150000000"

        "wal_block_size";"8192"
        "wal_buffers";"12MB"
        "wal_keep_segments";"0"
        "wal_level";"archive"
        "wal_receiver_status_interval"__;"10s"

        "wal_segment_size";"16MB"
        "wal_sender_delay";"1s"
        "wal_sync_method";"fdatasync"
        "wal_writer_delay";"200ms"
        "work_mem";"16MB"
        "xmlbinary";"base64"
        "xmloption";"content"
        "zero_damaged_pages";"off"



        --
        Erlend Garåsen
        Center for Information Technology Services
        University of Oslo
        P.O. Box 1086 Blindern, N-0317 OSLO, Norway
        Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax: (+47)
        22852970 <tel:%28%2B47%29%2022852970>, Mobile: (+47) 91380968
        <tel:%28%2B47%29%2091380968>, VIP: 31050





--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050