manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org>
Subject RE: ManifoldCF slow documentum indexing performance
Date Thu, 06 Jul 2017 05:57:43 GMT
Thanks Karl!!

After monitoring the CPU usage of Postgresql, the agents process, and the documentum server
process, mainly the documentum server process consumes most of the CPU and the agent process
is the second most CPU consumer.

In documentum server run script, java heap is having value as below.
-Xmx512m -Xms32m

Is there any way to speed up the indexing through heap configuration or increasing hardware?
If so, Kindly share us the details.

Regards,
Tamizh Kumaran

From: Karl Wright [mailto:daddywri@gmail.com]
Sent: Wednesday, July 05, 2017 6:19 PM
To: user@manifoldcf.apache.org
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: ManifoldCF slow documentum indexing performance

Hi Tamizh,

The likely culprit is Documentum itself.  In my experience it can be quite slow, depending
on how it is configured.  But you can confirm that by monitoring the CPU usage of Postgresql,
the agents process, and the documentum server process.  If none of these are CPU bound, then
Documentum itself is the problem.

Thanks,
Karl


On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Hi Team,

The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the same linux box. The
documentum server sits on a different linux box. The indexing performance is slow(approx 1000
doc per hour) with the documentum crawler. The used properties files is as below for reference

<configuration>
  <!-- Version string for UI -->
  <!-- Point to a specific (common) logging file -->
  <property name="org.apache.manifoldcf.logconfigfile" value="./logging.ini"/>
  <!-- Specify the connectors to be loaded -->
  <property name="org.apache.manifoldcf.connectorsconfigurationfile" value="../connectors.xml"/>
  <!-- Specify the path to the file resources directory -->
  <property name="org.apache.manifoldcf.fileresources" value="../file-resources"/>
  <property name="org.apache.manifoldcf.databaseimplementationclass" value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
  <property name="org.apache.manifoldcf.postgresql.hostname" value="localhost"/>
  <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
  <property name="org.apache.manifoldcf.dbsuperusername" value="postgres"/>
  <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
  <property name="org.apache.manifoldcf.database.name<http://org.apache.manifoldcf.database.name>"
value="manifoldcf"/>
  <property name="org.apache.manifoldcf.database.username" value="postgres"/>
  <property name="org.apache.manifoldcf.database.password" value=""/>
  <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
  <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
  <property name="org.apache.manifoldcf.crawler.repository.store_history" value="false"/>

  <property name="org.apache.manifoldcf.zookeeper.connectstring" value="***********:8349"/>
  <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="5000"/>
<!-- Tell MCF where to find the connector jars -->
  <libdir path="../connector-lib"/>
  <libdir path="../connector-common-lib"/>
  <libdir path="../connector-lib-proprietary"/>
  <!-- Any additional local properties go here -->
</configuration>

Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and the observation is
it taking a long time gap between each batch of 45 documents during processing.
Can you please point out any changes/recommendations that will speed up the indexing.

Regards,
Tamizh Kumaran Thamizharasan


Mime
View raw message