manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tamizh Kumaran Thamizharasan <tthamizhara...@worldbankgroup.org>
Subject RE: ManifoldCF slow documentum indexing performance
Date Wed, 12 Jul 2017 09:53:48 GMT
Thanks Karl and Fukran!!!

After pointing to different Documentum instance, the performance issue got resolved.
So its look like a Documentum issue.

Regards,
Tamizh Kumaran

From: Furkan KAMACI [mailto:furkankamaci@gmail.com]
Sent: Thursday, July 06, 2017 3:22 PM
To: user@manifoldcf.apache.org
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: ManifoldCF slow documentum indexing performance

Hi Tamizh,

Set Xmx and Xms to same values for a better performance.

Kind Regards,
Furkan KAMACI

On Thu, Jul 6, 2017 at 9:10 AM, Karl Wright <daddywri@gmail.com<mailto:daddywri@gmail.com>>
wrote:
Hi Tamizh,

The Documentum Server Process is a thin shell around DFC and its dependencies.  In order to
get helpful suggestions, you will need to contact Documentum, I'm afraid.

Thanks,
Karl



On Thu, Jul 6, 2017 at 1:57 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Thanks Karl!!

After monitoring the CPU usage of Postgresql, the agents process, and the documentum server
process, mainly the documentum server process consumes most of the CPU and the agent process
is the second most CPU consumer.

In documentum server run script, java heap is having value as below.
-Xmx512m -Xms32m

Is there any way to speed up the indexing through heap configuration or increasing hardware?
If so, Kindly share us the details.

Regards,
Tamizh Kumaran

From: Karl Wright [mailto:daddywri@gmail.com<mailto:daddywri@gmail.com>]
Sent: Wednesday, July 05, 2017 6:19 PM
To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Cc: Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
Subject: Re: ManifoldCF slow documentum indexing performance

Hi Tamizh,

The likely culprit is Documentum itself.  In my experience it can be quite slow, depending
on how it is configured.  But you can confirm that by monitoring the CPU usage of Postgresql,
the agents process, and the documentum server process.  If none of these are CPU bound, then
Documentum itself is the problem.

Thanks,
Karl


On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <tthamizharasan@worldbankgroup.org<mailto:tthamizharasan@worldbankgroup.org>>
wrote:
Hi Team,

The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the same linux box. The
documentum server sits on a different linux box. The indexing performance is slow(approx 1000
doc per hour) with the documentum crawler. The used properties files is as below for reference

<configuration>
  <!-- Version string for UI -->
  <!-- Point to a specific (common) logging file -->
  <property name="org.apache.manifoldcf.logconfigfile" value="./logging.ini"/>
  <!-- Specify the connectors to be loaded -->
  <property name="org.apache.manifoldcf.co<http://org.apache.manifoldcf.co>nnectorsconfigurationfile"
value="../connectors.xml"/>
  <!-- Specify the path to the file resources directory -->
  <property name="org.apache.manifoldcf.fi<http://org.apache.manifoldcf.fi>leresources"
value="../file-resources"/>
  <property name="org.apache.manifoldcf.databaseimplementationclass" value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
  <property name="org.apache.manifoldcf.postgresql.hostname" value="localhost"/>
  <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
  <property name="org.apache.manifoldcf.dbsuperusername" value="postgres"/>
  <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
  <property name="org.apache.manifoldcf.database.name<http://org.apache.manifoldcf.database.name>"
value="manifoldcf"/>
  <property name="org.apache.manifoldcf.database.username" value="postgres"/>
  <property name="org.apache.manifoldcf.database.password" value=""/>
  <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
  <property name="org.apache.manifoldcf.cr<http://org.apache.manifoldcf.cr>awler.threads"
value="15"/>
  <property name="org.apache.manifoldcf.cr<http://org.apache.manifoldcf.cr>awler.repository.store_history"
value="false"/>

  <property name="org.apache.manifoldcf.zookeeper.connectstring" value="***********:8349"/>
  <property name="org.apache.manifoldcf.zookeeper.sessiontimeout" value="5000"/>
<!-- Tell MCF where to find the connector jars -->
  <libdir path="../connector-lib"/>
  <libdir path="../connector-common-lib"/>
  <libdir path="../connector-lib-proprietary"/>
  <!-- Any additional local properties go here -->
</configuration>

Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and the observation is
it taking a long time gap between each batch of 45 documents during processing.
Can you please point out any changes/recommendations that will speed up the indexing.

Regards,
Tamizh Kumaran Thamizharasan




Mime
View raw message