The limit is applied in the method that calls noteTransformationConnectionRegistration.

Here it is:

>>>>>>
  /** Note the registration of a transformation connector used by the specified connections.
  * This method will be called when a connector is registered, on which the specified
  * connections depend.
  *@param connectionNames is the set of connection names.
  */
  @Override
  public void noteTransformationConnectorRegistration(String[] connectionNames)
    throws ManifoldCFException
  {
    // For each connection, find the corresponding list of jobs.  From these jobs, we want the job id and the status.
    List<String> list = new ArrayList<String>();
    int maxCount = database.findConjunctionClauseMax(new ClauseDescription[]{});
    int currentCount = 0;
    int i = 0;
    while (i < connectionNames.length)
    {
      if (currentCount == maxCount)
      {
        noteTransformationConnectionRegistration(list);
        list.clear();
        currentCount = 0;
      }

      list.add(connectionNames[i++]);
      currentCount++;
    }
    if (currentCount > 0)
      noteTransformationConnectionRegistration(list);
  }
<<<<<<

It looks correct now.  Do you see an issue with it?

Karl


On Mon, Jul 30, 2018 at 3:28 PM Mike Hugo <mike@piragua.com> wrote:
Nice catch Karl!

I applied that patch, but I'm still getting the same error.  

I think the problem is in JobManager.noteTransformationConnectionRegistration

If jobs.findJobsMatchingTransformations(list); returns a large list of ids (like it is doing in our case - 39,941 ids ), the generated query string still has a large OR clause in it.  I don't see getMaxOrClause applied to the query being built inside noteTransformationConnectionRegistration

>>>>>>
 protected void noteTransformationConnectionRegistration(List<String> list)
    throws ManifoldCFException
  {
    // Query for the matching jobs, and then for each job potentially adjust the state
    Long[] jobIDs = jobs.findJobsMatchingTransformations(list);
    if (jobIDs.length == 0)
      return;

    StringBuilder query = new StringBuilder();
    ArrayList newList = new ArrayList();
    
    query.append("SELECT ").append(jobs.idField).append(",").append(jobs.statusField)
      .append(" FROM ").append(jobs.getTableName()).append(" WHERE ")
      .append(database.buildConjunctionClause(newList,new ClauseDescription[]{
        new MultiClause(jobs.idField,jobIDs)}))
      .append(" FOR UPDATE");
    IResultSet set = database.performQuery(query.toString(),newList,null,null);
    int i = 0;
    while (i < set.getRowCount())
    {
      IResultRow row = set.getRow(i++);
      Long jobID = (Long)row.getValue(jobs.idField);
      int statusValue = jobs.stringToStatus((String)row.getValue(jobs.statusField));
      jobs.noteTransformationConnectorRegistration(jobID,statusValue);
    }
  }
<<<<<<


On Mon, Jul 30, 2018 at 1:55 PM, Karl Wright <daddywri@gmail.com> wrote:
The Postgresql driver supposedly limits this to 25 clauses at a pop:

>>>>>>
  @Override
  public int getMaxOrClause()
  {
    return 25;
  }

  /* Calculate the number of values a particular clause can have, given the values for all the other clauses.
  * For example, if in the expression x AND y AND z, x has 2 values and z has 1, find out how many values x can legally have
  * when using the buildConjunctionClause() method below.
  */
  @Override
  public int findConjunctionClauseMax(ClauseDescription[] otherClauseDescriptions)
  {
    // This implementation uses "OR"
    return getMaxOrClause();
  }
<<<<<<

The problem is that there was a cut-and-paste error, with just transformation connections, that defeated the limit.  I'll create a ticket and attach a patch.  CONNECTORS-1520.

Karl





On Mon, Jul 30, 2018 at 2:29 PM Karl Wright <daddywri@gmail.com> wrote:
Hi Mike,

This might be the issue indeed.  I'll look into it.

Karl


On Mon, Jul 30, 2018 at 2:26 PM Mike Hugo <mike@piragua.com> wrote:
I'm not sure what the solution is yet, but I think I may have found the culprit:

JobManager.noteTransformationConnectionRegistration(List<String> list) is creating a pretty big query:

SELECT id,status FROM jobs WHERE  (id=? OR id=? OR id=? OR id=? ........ OR id=?) FOR UPDATE

replace the elipsis  with as list of 39,941 ids (it's a huge query when it prints out)

It seems that the database doesn't like that query and closes the connection before returning with a response.

As I mentioned this instance of manifold has nearly 40,000 web crawlers.  is that a high number for Manifold to handle?

On Mon, Jul 30, 2018 at 10:58 AM, Karl Wright <daddywri@gmail.com> wrote:
Well, I have absolutely no idea what is wrong and I've never seen anything like that before.  But postgres is complaining because the communication with the JDBC client is being interrupted by something.

Karl


On Mon, Jul 30, 2018 at 10:39 AM Mike Hugo <mike@piragua.com> wrote:
No, and manifold and postgres run on the same host.

On Mon, Jul 30, 2018 at 9:35 AM, Karl Wright <daddywri@gmail.com> wrote:
' LOG:  incomplete message from client'

This shows a network issue.  Did your network configuration change recently?

Karl


On Mon, Jul 30, 2018 at 9:59 AM Mike Hugo <mike@piragua.com> wrote:
Tried a postgres vacuum and also a restart, but the problem persists.  Here's the log again with some additional logging details added (below)

I tried running the last query from the logs against the database and it works fine - I modified it to return a count and that also works.  

SELECT count(*) FROM jobs t1 WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND transformationname='Tika');
 count
-------
 39941
(1 row)


Is 39k jobs a high number?  I've run some other instances of Manifold with more like 1,000 jobs and those seem to be working fine.  That's the only thing I can think of that's different between this instance that won't start and the others.  Any ideas?

Thanks for your help!

Mike

LOG:  duration: 0.079 ms  parse <unnamed>: SELECT connectionname FROM transformationconnections WHERE classname=$1
LOG:  duration: 0.079 ms  bind <unnamed>: SELECT connectionname FROM transformationconnections WHERE classname=$1
DETAIL:  parameters: $1 = 'org.apache.manifoldcf.agents.transformation.tika.TikaExtractor'
LOG:  duration: 0.017 ms  execute <unnamed>: SELECT connectionname FROM transformationconnections WHERE classname=$1
DETAIL:  parameters: $1 = 'org.apache.manifoldcf.agents.transformation.tika.TikaExtractor'
LOG:  duration: 0.039 ms  parse <unnamed>: SELECT * FROM agents
LOG:  duration: 0.040 ms  bind <unnamed>: SELECT * FROM agents
LOG:  duration: 0.010 ms  execute <unnamed>: SELECT * FROM agents
LOG:  duration: 0.084 ms  parse <unnamed>: SELECT id FROM jobs t1 WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND transformationname=$1)
LOG:  duration: 0.359 ms  bind <unnamed>: SELECT id FROM jobs t1 WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND transformationname=$1)
DETAIL:  parameters: $1 = 'Tika'
LOG:  duration: 77.622 ms  execute <unnamed>: SELECT id FROM jobs t1 WHERE EXISTS(SELECT 'x' FROM jobpipelines WHERE t1.id=ownerid AND transformationname=$1)
DETAIL:  parameters: $1 = 'Tika'
LOG:  incomplete message from client
LOG:  disconnection: session time: 0:00:06.574 user=REMOVED database=REMOVED host=127.0.0.1 port=45356
>2018-07-30 12:36:09,415 [main] ERROR org.apache.manifoldcf.root - Exception: This connection has been closed.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: This connection has been closed.
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:627) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:368) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1236) ~[mcf-core.jar:?]
at org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:605) ~[mcf-pull-agent.jar:?]
at org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160) ~[mcf-pull-agent.jar:?]
at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239) [mcf-jetty-runner.jar:?]
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766) ~[postgresql-42.1.3.jar:42.1.3]
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576) ~[postgresql-42.1.3.jar:42.1.3]
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367) ~[postgresql-42.1.3.jar:42.1.3]
at org.apache.manifoldcf.core.database.Database.execute(Database.java:873) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) ~[mcf-core.jar:?]
org.apache.manifoldcf.core.interfaces.ManifoldCFException: This connection has been closed.
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:627)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
at org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:368)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1236)
at org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:605)
at org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766)
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576)
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:873)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)
LOG:  disconnection: session time: 0:00:10.677 user=postgres database=template1 host=127.0.0.1 port=45354



On Sun, Jul 29, 2018 at 8:09 AM, Karl Wright <daddywri@gmail.com> wrote:
It looks to me like your database server is not happy.  Maybe it's out of resources?  Not sure but a restart may be in order.

Karl


On Sun, Jul 29, 2018 at 9:06 AM Mike Hugo <mike@piragua.com> wrote:
Recently we started seeing this error when Manifold CF starts up.  We had been running Manifold CF with many web connectors and a few RSS feeds for a while and it had been working fine.  The server got rebooted and since then we started seeing this error. I'm not sure exactly what changed.  Any ideas as to where to start looking and how to fix this?

Thanks!

Mike  


Initial repository connections already created.
Configuration file successfully read
Successfully unregistered all domains
Successfully unregistered all output connectors
Successfully unregistered all transformation connectors
Successfully unregistered all mapping connectors
Successfully unregistered all authority connectors
Successfully unregistered all repository connectors
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.solr.SolrConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.searchblox.SearchBloxConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.opensearchserver.OpenSearchServerConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.nullconnector.NullConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.kafka.KafkaOutputConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.hdfs.HDFSOutputConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.gts.GTSConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.filesystem.FileOutputConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered output connector 'org.apache.manifoldcf.agents.output.amazoncloudsearch.AmazonCloudSearchConnector'
WARNING:  there is already a transaction in progress
WARNING:  there is no transaction in progress
Successfully registered transformation connector 'org.apache.manifoldcf.agents.transformation.tikaservice.TikaExtractor'
WARNING:  there is already a transaction in progress
LOG:  incomplete message from client
>2018-07-29 13:02:06,659 [main] ERROR org.apache.manifoldcf.root - Exception: This connection has been closed.
org.apache.manifoldcf.core.interfaces.ManifoldCFException: This connection has been closed.
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:627) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:368) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1236) ~[mcf-core.jar:?]
at org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:605) ~[mcf-pull-agent.jar:?]
at org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160) ~[mcf-pull-agent.jar:?]
at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239) [mcf-jetty-runner.jar:?]
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766) ~[postgresql-42.1.3.jar:42.1.3]
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576) ~[postgresql-42.1.3.jar:42.1.3]
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367) ~[postgresql-42.1.3.jar:42.1.3]
at org.apache.manifoldcf.core.database.Database.execute(Database.java:873) ~[mcf-core.jar:?]
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696) ~[mcf-core.jar:?]
org.apache.manifoldcf.core.interfaces.ManifoldCFException: This connection has been closed.
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.reinterpretException(DBInterfacePostgreSQL.java:627)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.rollbackCurrentTransaction(DBInterfacePostgreSQL.java:1296)
at org.apache.manifoldcf.core.database.Database.endTransaction(Database.java:368)
at org.apache.manifoldcf.core.database.DBInterfacePostgreSQL.endTransaction(DBInterfacePostgreSQL.java:1236)
at org.apache.manifoldcf.crawler.system.ManifoldCF.registerConnectors(ManifoldCF.java:605)
at org.apache.manifoldcf.crawler.system.ManifoldCF.reregisterAllConnectors(ManifoldCF.java:160)
at org.apache.manifoldcf.jettyrunner.ManifoldCFJettyRunner.main(ManifoldCFJettyRunner.java:239)
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:766)
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:1576)
at org.postgresql.jdbc.PgConnection.createStatement(PgConnection.java:367)
at org.apache.manifoldcf.core.database.Database.execute(Database.java:873)
at org.apache.manifoldcf.core.database.Database$ExecuteQueryThread.run(Database.java:696)