manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: hanging crawler
Date Mon, 24 Jun 2013 12:15:38 GMT
Hi Ahmet,

Sorry, googlemail has bug and keeps sending my mail before I am ready.

First, the following error indicates that a transaction should be retried:

org.apache.manifoldcf.core.
interfaces.ManifoldCFException: Database exception: SQLException doing
query (40001): ERROR: could not serialize access due to read/write
dependencies among transactions

The code to retry is already there, as is the code in the
DBInterfacePostgresql.java class to catch the exception.  But where this is
happening is actually trying to print out the EXPLAIN for a long-running
query - and I don't think we've ever seen an EXPLAIN take such a long time
before.

The second error occurs because the transaction has been aborted by
Postgresql but ManifoldCF isn't yet aware of it.  When ManifoldCF sees a
database error it does not know, it tries to reset all connections.  This
logic may or may not work properly; I have seen it hang before, however.

So I think what has happened is: (a) you had a really long running
"addDocuments()" transaction, and (b) it was so long that it tried to print
an EXPLAIN for it, and (c) that failed.  Then the reset logic hung
ManifoldCF.

So there are two bugs here:
- Reset logic hangs manifoldCF sometimes
- EXPLAIN may require retry

Can you create tickets for both of these?

Thanks,

Karl



On Mon, Jun 24, 2013 at 8:05 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Ahmet,
>
> Several things are happening here.
>
> First, the following error indicates that a transaction should be retried:
>
> What is happening is that the database connections are being pooled, and
> they are
>
>
> On Mon, Jun 24, 2013 at 7:59 AM, Ahmet Arslan <iorixxx@yahoo.com> wrote:
>
>> Hello All,
>>
>> I hava a MCF 1.2 setup ( with postgresql-9.2) where I crawl some
>> newspaper sites using Web connectors.
>>
>> I use following setting for jobs:
>>
>> Maximum hop count for link type 'link': 1
>> Maximum hop count for link type 'redirect': Unlimited
>> Hop count mode: No deletes, forever
>>
>> Start method: Start at beginning of schedule window
>> Schedule type: Scan every document once
>> Maximum run time: 90 minutes
>>
>> I scheduled jobs to run every two hours. However after some crawl hangs.
>> I found these exceptions in the log.
>>
>> What could be wrong? Any suggestions?
>>
>> Thanks,
>> Ahmet
>>
>> ERROR 2013-06-24 10:39:34,999 (Worker thread '1') - Worker thread
>> aborting and restarting due to database connection reset: Database
>> exception: SQLException doing query (25P02): ERROR: current transaction is
>> aborted, commands ignored until end of transaction block
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>> exception: SQLException doing query (25P02): ERROR: current transaction is
>> aborted, commands ignored until end of transaction block
>> at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
>> at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
>>  at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1430)
>> at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
>>  at
>> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
>>  at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
>> Caused by: org.postgresql.util.PSQLException: ERROR: current transaction
>> is aborted, commands ignored until end of transaction block
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
>>  at
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
>>  at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>>  ERROR 2013-06-24 10:39:33,473 (Worker thread '1') - Explain failed with
>> error Database exception: SQLException doing query (40001): ERROR: could
>> not serialize access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> org.apache.manifoldcf.core.interfaces.ManifoldCFException: Database
>> exception: SQLException doing query (40001): ERROR: could not serialize
>> access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> at
>> org.apache.manifoldcf.core.database.Database.executeViaThread(Database.java:717)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeUncachedQuery(Database.java:745)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.explainQuery(DBInterfacePostgreSQL.java:1233)
>>  at
>> org.apache.manifoldcf.core.database.Database$QueryCacheExecutor.create(Database.java:1449)
>> at
>> org.apache.manifoldcf.core.cachemanager.CacheManager.findObjectsAndExecute(CacheManager.java:144)
>>  at
>> org.apache.manifoldcf.core.database.Database.executeQuery(Database.java:186)
>> at
>> org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.performQuery(DBInterfacePostgreSQL.java:822)
>>  at
>> org.apache.manifoldcf.crawler.jobs.JobManager.addDocuments(JobManager.java:4148)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.processDocumentReferences(WorkerThread.java:2017)
>>  at
>> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.flush(WorkerThread.java:1948)
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:562)
>> Caused by: org.postgresql.util.PSQLException: ERROR: could not serialize
>> access due to read/write dependencies among transactions
>>   Detail: Reason code: Canceled on identification as a pivot, during
>> conflict out checking.
>>   Hint: The transaction might succeed if retried.
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2102)
>>  at
>> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1835)
>> at
>> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:500)
>> at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:388)
>>  at
>> org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:273)
>> at org.apache.manifoldcf.core.database.Database.execute(Database.java:862)
>>  at
>> org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:677)
>>
>
>

Mime
View raw message