manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sivakoti, Nikhilesh" <nikhilesh.sivak...@capgemini.com>
Subject RE: Manifold crawler issue | Alfresco | Not able to crawl large set of data
Date Sat, 01 Dec 2018 06:02:51 GMT
Hi Rafa,

Yes. It deals with REST API of alfresco. But that REST API fetches the nodes using the hibernate
configuration. So the API queries the database tables (ex.alf_node, alf_node_properties) based
on the transaction id.

Here we have 81million + transactions in database and the job is hanging to process those
many documents.

_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Rafa Haro [mailto:rharo@apache.org]
Sent: Friday, November 30, 2018 10:19 PM
To: user@manifoldcf.apache.org
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

Hi, you said before that you were using Alfresco Webscript connector and that connector deals
directly with content REST APIs, it has nothing do with database transactions, can you clarify
on that please?

Cheers,
Rafa

On Fri, Nov 30, 2018 at 5:25 PM Sivakoti, Nikhilesh <nikhilesh.sivakoti@capgemini.com<mailto:nikhilesh.sivakoti@capgemini.com>>
wrote:
Hi Karl,

Manifold crawler is failing to crawl all the transaction in Alfresco server. Alfresco DB have
around 81 million+ transactions.
Can we fine tune the performance in manifold server to handle the transactions?
_______________________________________________________________________
[Email_CBE.gif]Nikhilesh Sivakoti
Senior Consultant | I&D ECM

Capgemini India | Bangalore
Tel.: +8099062 – Mob.: + 91 81236 25125
www.capgemini.com<http://www.capgemini.com/>

People matter, results count.
_______________________________________________________________________
[50years]



Connect with Capgemini:
[Picto_Blog]<http://www.capgemini.com/insights-and-resources/blogs>[Picto_Twitter]<http://www.twitter.com/capgemini>[Picto_Facebook]<http://www.facebook.com/capgemini>[Picto_LinkedIn]<http://www.linkedin.com/company/capgemini>[Picto_Slideshare]<http://www.slideshare.net/capgemini>[Picto_YouTube]<http://www.youtube.com/capgeminimedia>








P Please consider the environment and do not print this email unless absolutely necessary.
Capgemini encourages environmental awareness.

From: Karl Wright [mailto:daddywri@gmail.com<mailto:daddywri@gmail.com>]
Sent: Friday, November 30, 2018 9:00 PM
To: user@manifoldcf.apache.org<mailto:user@manifoldcf.apache.org>
Subject: Re: Manifold crawler issue | Alfresco | Not able to crawl large set of data

I'm sorry, you'll need to provide more details about what exactly you are running into trouble
with.

Specifically, this: " But the current crawler using the SQL queries which is hard to query
under a path.  "

Karl

On Fri, Nov 30, 2018 at 4:42 AM Sivakoti, Nikhilesh <nikhilesh.sivakoti@capgemini.com<mailto:nikhilesh.sivakoti@capgemini.com>>
wrote:
Hi Team,

We have been migrating the indexes from GSA to Elastic search. We are using Manifold crawler
with alfresco webscript connector.
Crawler is able to crawl the less number of indexes. But it fails to crawl the indexes in
QA environment.
We have more than 80 million+ transactions in QA which is making hard to crawl the indexes.

Is there anything we could do here to do the phase wise migration? Or do we have lucene support
to query the contents as we need to query the contents under a specific path. But the current
crawler using the SQL queries which is hard to query under a path.

Kindly help me on this.

Thanks,
Nikhilesh

This message contains information that may be privileged or confidential and is the property
of the Capgemini Group. It is intended only for the person to whom it is addressed. If you
are not the intended recipient, you are not authorized to read, print, retain, copy, disseminate,
distribute, or use this message or any part thereof. If you receive this message in error,
please notify the sender immediately and delete all copies of this message.
Mime
View raw message