manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: ManifoldCF slow documentum indexing performance
Date Thu, 06 Jul 2017 06:10:07 GMT
Hi Tamizh,

The Documentum Server Process is a thin shell around DFC and its
dependencies.  In order to get helpful suggestions, you will need to
contact Documentum, I'm afraid.

Thanks,
Karl



On Thu, Jul 6, 2017 at 1:57 AM, Tamizh Kumaran Thamizharasan <
tthamizharasan@worldbankgroup.org> wrote:

> Thanks Karl!!
>
>
>
> After monitoring the CPU usage of Postgresql, the agents process, and the
> documentum server process, mainly the documentum server process consumes
> most of the CPU and the agent process is the second most CPU consumer.
>
>
>
> In documentum server run script, java heap is having value as below.
>
> *-Xmx512m -Xms32m*
>
>
>
> Is there any way to speed up the indexing through heap configuration or
> increasing hardware?
>
> If so, Kindly share us the details.
>
>
>
> Regards,
>
> Tamizh Kumaran
>
>
>
> *From:* Karl Wright [mailto:daddywri@gmail.com]
> *Sent:* Wednesday, July 05, 2017 6:19 PM
> *To:* user@manifoldcf.apache.org
> *Cc:* Sharnel Merdeck Pereira; Sundarapandian Arumaidurai Vethasigamani
> *Subject:* Re: ManifoldCF slow documentum indexing performance
>
>
>
> Hi Tamizh,
>
>
>
> The likely culprit is Documentum itself.  In my experience it can be quite
> slow, depending on how it is configured.  But you can confirm that by
> monitoring the CPU usage of Postgresql, the agents process, and the
> documentum server process.  If none of these are CPU bound, then Documentum
> itself is the problem.
>
>
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Wed, Jul 5, 2017 at 8:24 AM, Tamizh Kumaran Thamizharasan <
> tthamizharasan@worldbankgroup.org> wrote:
>
> Hi Team,
>
>
>
> The postgresql 9.2, solr 5.3.2 and manifoldcf 2.7.1 are installed on the
> same linux box. The documentum server sits on a different linux box. The
> indexing performance is slow(approx 1000 doc per hour) with the documentum
> crawler. The used properties files is as below for reference
>
>
>
> <configuration>
>
>   <!-- Version string for UI -->
>
>   <!-- Point to a specific (common) logging file -->
>
>   <property name="org.apache.manifoldcf.logconfigfile"
> value="./logging.ini"/>
>
>   <!-- Specify the connectors to be loaded -->
>
>   <property name="org.apache.manifoldcf.connectorsconfigurationfile"
> value="../connectors.xml"/>
>
>   <!-- Specify the path to the file resources directory -->
>
>   <property name="org.apache.manifoldcf.fileresources"
> value="../file-resources"/>
>
>   <property name="org.apache.manifoldcf.databaseimplementationclass"
> value="org.apache.manifoldcf.core.database.DBInterfacePostgreSQL"/>
>
>   <property name="org.apache.manifoldcf.postgresql.hostname"
> value="localhost"/>
>
>   <property name="org.apache.manifoldcf.postgresql.port" value="5432"/>
>
>   <property name="org.apache.manifoldcf.dbsuperusername"
> value="postgres"/>
>
>   <property name="org.apache.manifoldcf.dbsuperuserpassword" value=""/>
>
>   <property name="org.apache.manifoldcf.database.name"
> value="manifoldcf"/>
>
>   <property name="org.apache.manifoldcf.database.username"
> value="postgres"/>
>
>   <property name="org.apache.manifoldcf.database.password" value=""/>
>
>   <property name="org.apache.manifoldcf.database.maxhandles" value="100"/>
>
>   <property name="org.apache.manifoldcf.crawler.threads" value="15"/>
>
>   <property name="org.apache.manifoldcf.crawler.repository.store_history"
> value="false"/>
>
>
>
>   <property name="org.apache.manifoldcf.zookeeper.connectstring"
> value="***********:8349"/>
>
>   <property name="org.apache.manifoldcf.zookeeper.sessiontimeout"
> value="5000"/>
>
> <!-- Tell MCF where to find the connector jars -->
>
>   <libdir path="../connector-lib"/>
>
>   <libdir path="../connector-common-lib"/>
>
>   <libdir path="../connector-lib-proprietary"/>
>
>   <!-- Any additional local properties go here -->
>
> </configuration>
>
>
>
> Initially the org.apache.manifoldcf.crawler.threads is setup with 45 and
> the observation is it taking a long time gap between each batch of 45
> documents during processing.
>
> Can you please point out any changes/recommendations that will speed up
> the indexing.
>
>
>
> Regards,
>
> Tamizh Kumaran Thamizharasan
>
>
>
>
>

Mime
View raw message