manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org>
Subject ManifoldCF slow documentum indexing performance
Date Wed, 05 Jul 2017 12:24:34 GMT
Hi Team,

The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the same linux box. The
documentum server sits on a different linux box. The indexing performance is slow(approx 1000
doc per hour) with the documentum crawler. The used properties files is as below for reference

<configuration>
  <!-- Version string for UI -->
  <!-- Point to a specific (common) logging file -->
  <property name="org.apache.manifoldcf.logconfigfile" value="./logging.ini"/>
  <!-- Specify the connectors to be loaded -->
  <property name="org.apache.manifoldcf.connectorsconfigurationfile" value="../connectors.xml"/>
  <!-- Specify the path to the file resources directory -->
  <property name="org.apache.manifoldcf.fileresources" value="../file-resources"/>
  <property name="org.apache.manifoldcf.databaseimplementationclass" value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
  <property name="org.apache.manifoldcf.postgresql.hostname" value="localhost"/>
  <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
  <property name="org.apache.manifoldcf.dbsuperusername" value="postgres"/>
  <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
  <property name="org.apache.manifoldcf.database.name" value="manifoldcf"/>
  <property name="org.apache.manifoldcf.database.username" value="postgres"/>
  <property name="org.apache.manifoldcf.database.password" value=""/>
  <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
  <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
  <property name="org.apache.manifoldcf.crawler.repository.store_history" value="false"/>

  <property name="org.apache.manifoldcf.zookeeper.connectstring" value="***********:8349"/>
  <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="5000"/>
<!-- Tell MCF where to find the connector jars -->
  <libdir path="../connector-lib"/>
  <libdir path="../connector-common-lib"/>
  <libdir path="../connector-lib-proprietary"/>
  <!-- Any additional local properties go here -->
</configuration>

Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and the observation is
it taking a long time gap between each batch of 45 documents during processing.
Can you please point out any changes/recommendations that will speed up the indexing.

Regards,
Tamizh Kumaran Thamizharasan


Mime
View raw message