manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Aware <ameya.aw...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 15:08:09 GMT
yeah.. i thought so that it should not effect in 4000 documents.

I am using filesystem connector to crawl all of my C drive and output
connection is null.

There are no error logs in MCF. MCF is standstill at same screen since half
an hour.

Attaching some snapshots for your reference.


Thanks,
Ameya




On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Ameya,
>
> 4000 documents is nothing at all.  We have load tests which I run on every
> release that include more than 100000 documents on a crawl.
>
> Can you be more specific about the case that you say "hung up"?
> Specifically:
>
> (1) What kind of crawl is this?  SharePoint?  Web?
> (2) Are there any errors in the manifoldcf log?
>
> Thanks,
> Karl
>
>
>
>
>
> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware <ameya.aware@gmail.com>
> wrote:
>
>> Hi Karl,
>>
>> I spent some time going through PostgreSQL 9.3 manual.
>> I configured PostgreSQL for MCF and saw the significant change in
>> performance time.
>>
>> I ran it yesterday for some 4000 documents. When i started running again
>> today, the performance was very poor and after 200 documents, it hung up.
>>
>> Is it because of periodic maintenance it needs?  Also, i would want to
>> know where and how exactly VACUUM FULL command needs to be used?
>>
>> Thanks,
>> Ameya
>>
>>
>> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> It is fine; I am running Postgresql 9.3 here.
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya Aware <ameya.aware@gmail.com>
>>> wrote:
>>>
>>>> is PostgreySQL 9.3 version good because i already have it in my
>>>> machine.. Though documentation says "ManifoldCF has been tested
>>>> against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>
>>>> Ameya
>>>>
>>>>
>>>> On Thu, Jul 17, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> If you haven't configured MCF to use PostgreSQL, then you are using
>>>>> Derby, which is not recommended for production use.
>>>>>
>>>>> Instructions on how to set up MCF to use PostgreSQL are available on
>>>>> the MCF site on the how-to-build-and-deploy page.  Configuring PostgreSQL
>>>>> for millions or tens of millions of documents will require someone to
learn
>>>>> about PostgreSQL and how to administer it.  The how-to-build-and-deploy
>>>>> page provides some (old) guidelines and hints, but if I were you I'd
read
>>>>> the postgresql manual for the version you install.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 1:04 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Ooh ok.
>>>>>>
>>>>>> Actually i have never configured PostgreySQL yet. i am simply using
>>>>>> binary distribution of MCF to configure file system connectors to
connect
>>>>>> to Solr.
>>>>>>
>>>>>> Do i need to configure PostgreySQL?? How can i proceed from here
to
>>>>>> check performance measurements?
>>>>>>
>>>>>> Thanks,
>>>>>> Ameya
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 17, 2014 at 12:10 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Yes.  Also have a look at the how-to-build-and-deploy page for
hints
>>>>>>> on how to configure PostgreSQL for maximum performance.
>>>>>>>
>>>>>>> ManifoldCF's performance is almost entirely based on the database.
>>>>>>> If you are using PostgreSQL, which is the fastest ManifoldCF
choice, you
>>>>>>> should be able to see in the logs when queries take a long time,
or when
>>>>>>> indexes are automatically rebuilt.  Could you provide any information
as to
>>>>>>> what your overall system setup looks like?
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware <ameya.aware@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>>
>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>
>>>>>>>> This page?
>>>>>>>>
>>>>>>>> Ameya
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ameya,
>>>>>>>>>
>>>>>>>>> Have you read the performance page?
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>> Sent from my Windows Phone
>>>>>>>>> ------------------------------
>>>>>>>>> From: Ameya Aware
>>>>>>>>> Sent: 7/17/2014 11:27 AM
>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>> Subject: Performance issues
>>>>>>>>>
>>>>>>>>> Hi
>>>>>>>>>
>>>>>>>>> I have millions of documents to crawl and send them to
Solr.
>>>>>>>>>
>>>>>>>>> But when i run it for thousands documents, it takes too
much time
>>>>>>>>> for it or sometimes it even hangs up.
>>>>>>>>>
>>>>>>>>> So what could be the way to reduce the performance time?
>>>>>>>>>
>>>>>>>>> Also, i do not need content of the documents, i just
need
>>>>>>>>> metadata, so can i skip content part from reading and
fetching and will
>>>>>>>>> that improve performance time?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Ameya
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message