manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erlend Garåsen <e.f.gara...@usit.uio.no>
Subject Re: Timeout problems with web crawling
Date Thu, 25 Apr 2013 11:45:02 GMT

Yes, and I will assign that issue to me, but possible reassign/unassign 
it in case I do not get time before I'm flying to US.

Erlend

On 25.04.13 12.17, Karl Wright wrote:
> Could you also open a ticket for (and hopefully submit a patch for) the
> problem you mentioned having to do with valid seeds being flagged as
> illegal?  I'm trying to get 1.2 out the door and that seems to me to be
> a problem we can actually fix.
>
> Karl
>
>
> On Wed, Apr 24, 2013 at 7:52 PM, Karl Wright <daddywri@gmail.com
> <mailto:daddywri@gmail.com>> wrote:
>
>     I created a test here that exercises the throttling code
>     thoroughly.  It does not hang with the fix.  I have not confirmed
>     that it *does* hang without it, though.
>
>     However, I feel reasonably certain that the underlying cause of this
>     issue is in fact your JVM.  If the change doesn't help, and if you
>     are not running the latest version of whatever JVM series you've
>     adopted, it may be wise to update to the latest version.
>
>     Karl
>
>
>
>     On Wed, Apr 24, 2013 at 11:01 AM, Karl Wright <daddywri@gmail.com
>     <mailto:daddywri@gmail.com>> wrote:
>
>         Hi Erlend,
>
>         The lock-up seems to be due to the throttling code getting
>         stuck.  This code has been largely unchanged for many versions,
>         so I think it is likely to happen because of changes in the
>         jvm's optimization algorithms.  I've checked in a fix which
>         should hopefully stop that from happening to trunk.  See
>         CONNECTORS-679.
>
>         Karl
>
>
>         On Wed, Apr 24, 2013 at 5:29 AM, Karl Wright <daddywri@gmail.com
>         <mailto:daddywri@gmail.com>> wrote:
>
>             I updated the ticket as well; all the EXPLAIN queries are
>             very fast at the moment.  The reason is that the checktime
>             value you used needs to be the current time or larger, which
>             is why I picked
>
>             1466628406182
>
>             Karl
>
>
>
>             On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen
>             <e.f.garasen@usit.uio.no <mailto:e.f.garasen@usit.uio.no>>
>             wrote:
>
>
>                 Thanks Karl!
>
>                 I have updated the ticket with the information you
>                 requested, but there is an email delay at the moment
>                 from Jira (or this list).
>
>                 Erlend
>
>
>                 On 24.04.13 01.37 <tel:24.04.13%2001.37>, Karl Wright wrote:
>
>                     Hi Erlend,
>
>                     I had a conversation with the PostgreSQL people.
>                       Have a look at ticket
>                     CONNECTORS-678; you'll want to get an EXPLAIN
>                     ANALYZE of the query I
>                     provided before on the live system - but update the
>                     checktime value so
>                     it catches more than zero records:
>
>                     EXPLAIN ANALYZE SELECT t0.id <http://t0.id>
>                     <http://t0.id>,t0.jobid,t0.__dochash,t0.
>
>                     docid,t0.status,t0.failtime,__t0.failcount,t0.priorityset
>                     FROM jobqueue
>                     t0  WHERE t0.status IN ('P','G') AND
>                     t0.checkaction='R' AND
>                     t0.checktime<=1566628406182 AND
>                           EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status
>                     IN ('A','a') AND
>                     t1.id <http://t1.id> <http://t1.id>=t0.jobid AND
>                     t1.priority=5) AND
>
>                     NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>                     t2.dochash=t0.dochash AND
>                     t2.status IN ('A','F','a','f','D','d') AND
>                     t2.jobid!=t0.jobid) AND
>                           NOT EXISTS(SELECT 'x' FROM prereqevents
>                     t3,events t4 WHERE t0.id <http://t0.id>
>                     <http://t0.id>=t3.owner AND t3.eventname=t4.name
>                     <http://t4.name> <http://t4.name>) ORDER
>
>                     BY t0.docpriority ASC LIMIT 100;
>
>                     I'd also like to see what the contents of your
>                     prereqevents table looks
>                     like on that system.  If you can do SELECT * from
>                     prereqevents; , and
>                     give me the first ten or so that  pop up, I can take
>                     it from there.
>
>                     Karl
>
>
>
>                     On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright
>                     <daddywri@gmail.com <mailto:daddywri@gmail.com>
>                     <mailto:daddywri@gmail.com
>                     <mailto:daddywri@gmail.com>>> wrote:
>
>                          Hi Erlend,
>
>                          If you have access to psql and can connect to
>                     the production
>                          database, I'd like to explore whether the
>                     relatively recent change I
>                          made for MySQL is in fact breaking PostgreSQL
>                     stuffer thread queries.
>
>                          First, read this page:
>                     http://www.postgresql.org/__docs/current/static/indexes-__ordering.html
>                     <http://www.postgresql.org/docs/current/static/indexes-ordering.html>
>
>                          What we're trying to achieve is that the
>                     planner use the index whose
>                          first column is docpriority.  The way we will
>                     need to determine that
>                          is to open up psql, and "explain" the query in
>                     question, and then
>                          modify the query to see if it begins to plan
>                     correctly.
>
>                          In psql, you will need to assess the query in
>                     question.  To do that,
>                          execute the following:
>
>                          EXPLAIN SELECT t0.id <http://t0.id>
>
>                     <http://t0.id>,t0.jobid,t0.__dochash,t0.docid,t0.status,t0.__failtime,t0.failcount,t0.__priorityset
>
>                          FROM jobqueue t0  WHERE t0.status IN ('P','G') AND
>                          t0.checkaction='R' AND
>                     t0.checktime<=1366628406182 AND
>                               EXISTS(SELECT 'x' FROM jobs t1 WHERE
>                     t1.status IN ('A','a') AND
>                     t1.id <http://t1.id> <http://t1.id>=t0.jobid AND
>                     t1.priority=5) AND
>
>                               NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
>                          t2.dochash=t0.dochash AND t2.status IN
>                     ('A','F','a','f','D','d') AND
>                          t2.jobid!=t0.jobid) AND
>                               NOT EXISTS(SELECT 'x' FROM prereqevents
>                     t3,events t4 WHERE
>                     t0.id <http://t0.id> <http://t0.id>=t3.owner AND
>                     t3.eventname=t4.name <http://t4.name>
>                          <http://t4.name>) ORDER BY t0.docpriority ASC
>                     LIMIT 100;
>
>
>
>                          The production job MUST be running or it is a
>                     meaningless exercise.
>
>                          What you want to see is that the outermost
>                     action is NOT a sort;
>                          instead, you want an
>
>                          Index Scan using i1362584563121
>
>
>                          If you get something else, try changing the
>                     ORDER BY clause to includeNULLS LAST orNULLS FIRST
>                     to see if that
>
>
>                          changes anything.  If not, I may ask you to
>                     execute a sample query or two and tell me what it
>                     returns, so I can assess
>
>                            whether the data is messed up in some way.
>
>
>
>                          Karl
>
>
>
>
>
>
>                          On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen
>                          <e.f.garasen@usit.uio.no
>                     <mailto:e.f.garasen@usit.uio.no>
>                     <mailto:e.f.garasen@usit.uio.__no
>                     <mailto:e.f.garasen@usit.uio.no>>> wrote:
>
>                              On 23.04.13 15.30, Karl Wright wrote:
>
>                                  What version of Postgresql is this?
>                       Given the same
>                                  Postgresql version,
>
>
>                              9.1.9
>
>                              "allow_system_table_mods";"____off"
>
>                              "application_name";"pgAdmin III - Query Tool"
>
>                     "archive_command";"/local/opt/____pgsql-9.1/bin/pgclarchive___wal.__sh
>
>                              -P %p -F %f -S dbpg-search"
>                              "archive_mode";"on"
>                              "archive_timeout";"0"
>                              "array_nulls";"on"
>                              "authentication_timeout";"____1min"
>                              "autovacuum";"on"
>                              "autovacuum_analyze_scale_____factor";"0.1"
>                              "autovacuum_analyze_threshold"____;"50"
>                              "autovacuum_freeze_max_age";"____200000000"
>                              "autovacuum_max_workers";"3"
>                              "autovacuum_naptime";"1min"
>                              "autovacuum_vacuum_cost_delay"____;"20ms"
>                              "autovacuum_vacuum_cost_limit"____;"-1"
>                              "autovacuum_vacuum_scale_____factor";"0.2"
>                              "autovacuum_vacuum_threshold";____"50"
>                              "backslash_quote";"safe_____encoding"
>
>                              "bgwriter_delay";"200ms"
>                              "bgwriter_lru_maxpages";"100"
>                              "bgwriter_lru_multiplier";"2"
>                              "block_size";"8192"
>                              "bonjour";"off"
>                              "bonjour_name";""
>                              "bytea_output";"escape"
>                              "check_function_bodies";"on"
>                              "checkpoint_completion_target"____;"0.5"
>
>                              "checkpoint_segments";"128"
>                              "checkpoint_timeout";"5min"
>                              "checkpoint_warning";"30s"
>                              "client_encoding";"UNICODE"
>                              "client_min_messages";"notice"
>                              "commit_delay";"0"
>                              "commit_siblings";"5"
>                              "constraint_exclusion";"____partition"
>
>                              "cpu_index_tuple_cost";"0.005"
>                              "cpu_operator_cost";"0.0025"
>                              "cpu_tuple_cost";"0.01"
>                              "cursor_tuple_fraction";"0.1"
>                              "custom_variable_classes";""
>                              "DateStyle";"ISO, MDY"
>                              "db_user_namespace";"off"
>                              "deadlock_timeout";"1s"
>                              "debug_assertions";"off"
>                              "debug_pretty_print";"on"
>                              "debug_print_parse";"off"
>                              "debug_print_plan";"off"
>                              "debug_print_rewritten";"off"
>                              "default_statistics_target";"____100"
>                              "default_tablespace";""
>
>                     "default_text_search_config";"____pg_catalog.english"
>                              "default_transaction_____deferrable";"off"
>                              "default_transaction_____isolation";"read
>                     committed"
>                              "default_transaction_read_____only";"off"
>                              "default_with_oids";"off"
>                              "effective_cache_size";"____16093MB"
>
>                              "effective_io_concurrency";"1"
>                              "enable_bitmapscan";"on"
>                              "enable_hashagg";"on"
>                              "enable_hashjoin";"on"
>                              "enable_indexscan";"on"
>                              "enable_material";"on"
>                              "enable_mergejoin";"on"
>                              "enable_nestloop";"on"
>                              "enable_seqscan";"on"
>                              "enable_sort";"on"
>                              "enable_tidscan";"on"
>                              "escape_string_warning";"on"
>                              "exit_on_error";"off"
>                              "extra_float_digits";"0"
>                              "from_collapse_limit";"8"
>                              "fsync";"on"
>                              "full_page_writes";"on"
>                              "geqo";"on"
>                              "geqo_effort";"5"
>                              "geqo_generations";"0"
>                              "geqo_pool_size";"0"
>                              "geqo_seed";"0"
>                              "geqo_selection_bias";"2"
>                              "geqo_threshold";"12"
>                              "gin_fuzzy_search_limit";"0"
>                              "hot_standby";"off"
>                              "hot_standby_feedback";"off"
>                              "ignore_system_indexes";"off"
>                              "integer_datetimes";"on"
>                              "IntervalStyle";"postgres"
>                              "join_collapse_limit";"8"
>                              "krb_caseins_users";"off"
>                              "krb_srvname";"postgres"
>                              "lc_collate";"C"
>                              "lc_ctype";"C"
>                              "lc_messages";"C"
>                              "lc_monetary";"C"
>                              "lc_numeric";"C"
>                              "lc_time";"C"
>                              "listen_addresses";"dbpg-____search.uio.no
>                     <http://dbpg-__search.uio.no>
>                              <http://dbpg-search.uio.no>"
>                              "lo_compat_privileges";"off"
>                              "local_preload_libraries";""
>                              "log_autovacuum_min_duration";____"-1"
>
>                              "log_checkpoints";"off"
>                              "log_connections";"off"
>                              "log_destination";"stderr"
>                              "log_disconnections";"off"
>                              "log_duration";"off"
>                              "log_error_verbosity";"____default"
>
>                              "log_executor_stats";"off"
>                              "log_file_mode";"0600"
>                              "log_hostname";"off"
>                              "log_line_prefix";"[%m] [dbpg-search]
>                     [%d:%u] [%r] [%p] [%x] "
>                              "log_lock_waits";"off"
>                              "log_min_duration_statement";"____-1"
>                              "log_min_error_statement";"____info"
>
>                              "log_min_messages";"notice"
>                              "log_parser_stats";"off"
>                              "log_planner_stats";"off"
>                              "log_rotation_age";"0"
>                              "log_rotation_size";"0"
>                              "log_statement";"none"
>                              "log_statement_stats";"off"
>                              "log_temp_files";"-1"
>                              "log_timezone";"Europe/Oslo"
>                              "log_truncate_on_rotation";"____on"
>
>                              "logging_collector";"on"
>                              "maintenance_work_mem";"128MB"
>                              "max_connections";"600"
>                              "max_files_per_process";"1000"
>                              "max_function_args";"100"
>                              "max_identifier_length";"63"
>                              "max_index_keys";"32"
>                              "max_locks_per_transaction";"____64"
>                              "max_pred_locks_per_____transaction";"64"
>                              "max_prepared_transactions";"____0"
>                              "max_stack_depth";"4MB"
>                              "max_standby_archive_delay";"____30s"
>                              "max_standby_streaming_delay";____"30s"
>
>                              "max_wal_senders";"0"
>                              "password_encryption";"on"
>                              "port";"5432"
>                              "post_auth_delay";"0"
>                              "pre_auth_delay";"0"
>                              "quote_all_identifiers";"off"
>                              "random_page_cost";"2"
>                              "replication_timeout";"1min"
>                              "restart_after_crash";"on"
>                              "search_path";""$user",public"
>                              "segment_size";"1GB"
>                              "seq_page_cost";"1"
>                              "server_encoding";"UTF8"
>                              "server_version";"9.1.9"
>                              "server_version_num";"90109"
>                              "session_replication_role";"____origin"
>
>                              "shared_buffers";"384MB"
>                              "silent_mode";"off"
>                              "sql_inheritance";"on"
>                              "ssl";"on"
>                              "ssl_renegotiation_limit";"____512MB"
>                              "standard_conforming_strings";____"off"
>                              "statement_timeout";"0"
>                              "superuser_reserved_____connections";"3"
>
>                              "synchronize_seqscans";"on"
>                              "synchronous_commit";"on"
>                              "synchronous_standby_names";""
>                              "syslog_facility";"local0"
>                              "syslog_ident";"postgres"
>                              "tcp_keepalives_count";"9"
>                              "tcp_keepalives_idle";"7200"
>                              "tcp_keepalives_interval";"75"
>                              "temp_buffers";"8MB"
>                              "temp_tablespaces";""
>                              "TimeZone";"Europe/Oslo"
>                              "timezone_abbreviations";"____Default"
>                              "trace_notify";"off"
>                              "trace_recovery_messages";"____log"
>                              "trace_sort";"off"
>                              "track_activities";"on"
>                              "track_activity_query_size";"____1024"
>
>                              "track_counts";"on"
>                              "track_functions";"none"
>                              "transaction_deferrable";"off"
>                              "transaction_isolation";"read committed"
>                              "transaction_read_only";"off"
>                              "transform_null_equals";"off"
>                              "unix_socket_group";""
>                              "unix_socket_permissions";"____0700"
>
>                              "update_process_title";"on"
>                              "vacuum_cost_delay";"0"
>                              "vacuum_cost_limit";"200"
>                              "vacuum_cost_page_dirty";"20"
>                              "vacuum_cost_page_hit";"1"
>                              "vacuum_cost_page_miss";"10"
>                              "vacuum_defer_cleanup_age";"0"
>                              "vacuum_freeze_min_age";"____50000000"
>                              "vacuum_freeze_table_age";"____150000000"
>
>                              "wal_block_size";"8192"
>                              "wal_buffers";"12MB"
>                              "wal_keep_segments";"0"
>                              "wal_level";"archive"
>                              "wal_receiver_status_interval"____;"10s"
>
>                              "wal_segment_size";"16MB"
>                              "wal_sender_delay";"1s"
>                              "wal_sync_method";"fdatasync"
>                              "wal_writer_delay";"200ms"
>                              "work_mem";"16MB"
>                              "xmlbinary";"base64"
>                              "xmloption";"content"
>                              "zero_damaged_pages";"off"
>
>
>
>                              --
>                              Erlend Garåsen
>                              Center for Information Technology Services
>                              University of Oslo
>                              P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>                              Ph: (+47) 22840193
>                     <tel:%28%2B47%29%2022840193>
>                     <tel:%28%2B47%29%2022840193>, Fax: (+47)
>                              22852970 <tel:%28%2B47%29%2022852970>,
>                     Mobile: (+47) 91380968 <tel:%28%2B47%29%2091380968>
>                              <tel:%28%2B47%29%2091380968>, VIP: 31050
>
>
>
>
>
>                 --
>                 Erlend Garåsen
>                 Center for Information Technology Services
>                 University of Oslo
>                 P.O. Box 1086 Blindern, N-0317 OSLO, Norway
>                 Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax:
>                 (+47) 22852970 <tel:%28%2B47%29%2022852970>, Mobile:
>                 (+47) 91380968 <tel:%28%2B47%29%2091380968>, VIP: 31050
>
>
>
>
>


-- 
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050

Mime
View raw message