manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 15:02:21 GMT
Hi Ameya,

4000 documents is nothing at all.  We have load tests which I run on every
release that include more than 100000 documents on a crawl.

Can you be more specific about the case that you say "hung up"?
Specifically:

(1) What kind of crawl is this?  SharePoint?  Web?
(2) Are there any errors in the manifoldcf log?

Thanks,
Karl





On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware <ameya.aware@gmail.com> wrote:

> Hi Karl,
>
> I spent some time going through PostgreSQL 9.3 manual.
> I configured PostgreSQL for MCF and saw the significant change in
> performance time.
>
> I ran it yesterday for some 4000 documents. When i started running again
> today, the performance was very poor and after 200 documents, it hung up.
>
> Is it because of periodic maintenance it needs?  Also, i would want to
> know where and how exactly VACUUM FULL command needs to be used?
>
> Thanks,
> Ameya
>
>
> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> It is fine; I am running Postgresql 9.3 here.
>>
>> Karl
>>
>>
>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya Aware <ameya.aware@gmail.com>
>> wrote:
>>
>>> is PostgreySQL 9.3 version good because i already have it in my
>>> machine.. Though documentation says "ManifoldCF has been tested against
>>> version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>
>>> Ameya
>>>
>>>
>>> On Thu, Jul 17, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> If you haven't configured MCF to use PostgreSQL, then you are using
>>>> Derby, which is not recommended for production use.
>>>>
>>>> Instructions on how to set up MCF to use PostgreSQL are available on
>>>> the MCF site on the how-to-build-and-deploy page.  Configuring PostgreSQL
>>>> for millions or tens of millions of documents will require someone to learn
>>>> about PostgreSQL and how to administer it.  The how-to-build-and-deploy
>>>> page provides some (old) guidelines and hints, but if I were you I'd read
>>>> the postgresql manual for the version you install.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Thu, Jul 17, 2014 at 1:04 PM, Ameya Aware <ameya.aware@gmail.com>
>>>> wrote:
>>>>
>>>>> Ooh ok.
>>>>>
>>>>> Actually i have never configured PostgreySQL yet. i am simply using
>>>>> binary distribution of MCF to configure file system connectors to connect
>>>>> to Solr.
>>>>>
>>>>> Do i need to configure PostgreySQL?? How can i proceed from here to
>>>>> check performance measurements?
>>>>>
>>>>> Thanks,
>>>>> Ameya
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 12:10 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Yes.  Also have a look at the how-to-build-and-deploy page for hints
>>>>>> on how to configure PostgreSQL for maximum performance.
>>>>>>
>>>>>> ManifoldCF's performance is almost entirely based on the database.
>>>>>> If you are using PostgreSQL, which is the fastest ManifoldCF choice,
you
>>>>>> should be able to see in the logs when queries take a long time,
or when
>>>>>> indexes are automatically rebuilt.  Could you provide any information
as to
>>>>>> what your overall system setup looks like?
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware <ameya.aware@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>
>>>>>>> This page?
>>>>>>>
>>>>>>> Ameya
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ameya,
>>>>>>>>
>>>>>>>> Have you read the performance page?
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>> Sent from my Windows Phone
>>>>>>>> ------------------------------
>>>>>>>> From: Ameya Aware
>>>>>>>> Sent: 7/17/2014 11:27 AM
>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>> Subject: Performance issues
>>>>>>>>
>>>>>>>> Hi
>>>>>>>>
>>>>>>>> I have millions of documents to crawl and send them to Solr.
>>>>>>>>
>>>>>>>> But when i run it for thousands documents, it takes too much
time
>>>>>>>> for it or sometimes it even hangs up.
>>>>>>>>
>>>>>>>> So what could be the way to reduce the performance time?
>>>>>>>>
>>>>>>>> Also, i do not need content of the documents, i just need
metadata,
>>>>>>>> so can i skip content part from reading and fetching and
will that improve
>>>>>>>> performance time?
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Ameya
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message