manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 15:12:40 GMT
Could you please get a thread dump and send that to me?  Please send as a
text file not a screen shot.

To get a thread dump, get the process ID of the agents process, and use the
jdk's jstack utility to obtain the dump.

Thanks,
Karl



On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware <ameya.aware@gmail.com> wrote:

> yeah.. i thought so that it should not effect in 4000 documents.
>
> I am using filesystem connector to crawl all of my C drive and output
> connection is null.
>
> There are no error logs in MCF. MCF is standstill at same screen since
> half an hour.
>
> Attaching some snapshots for your reference.
>
>
> Thanks,
> Ameya
>
>
>
>
> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Ameya,
>>
>> 4000 documents is nothing at all.  We have load tests which I run on
>> every release that include more than 100000 documents on a crawl.
>>
>> Can you be more specific about the case that you say "hung up"?
>> Specifically:
>>
>> (1) What kind of crawl is this?  SharePoint?  Web?
>> (2) Are there any errors in the manifoldcf log?
>>
>> Thanks,
>> Karl
>>
>>
>>
>>
>>
>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware <ameya.aware@gmail.com>
>> wrote:
>>
>>> Hi Karl,
>>>
>>> I spent some time going through PostgreSQL 9.3 manual.
>>> I configured PostgreSQL for MCF and saw the significant change in
>>> performance time.
>>>
>>> I ran it yesterday for some 4000 documents. When i started running again
>>> today, the performance was very poor and after 200 documents, it hung up.
>>>
>>> Is it because of periodic maintenance it needs?  Also, i would want to
>>> know where and how exactly VACUUM FULL command needs to be used?
>>>
>>> Thanks,
>>> Ameya
>>>
>>>
>>> On Thu, Jul 17, 2014 at 2:13 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> It is fine; I am running Postgresql 9.3 here.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jul 17, 2014 at 2:08 PM, Ameya Aware <ameya.aware@gmail.com>
>>>> wrote:
>>>>
>>>>> is PostgreySQL 9.3 version good because i already have it in my
>>>>> machine.. Though documentation says "ManifoldCF has been tested
>>>>> against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>
>>>>> Ameya
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 1:09 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> If you haven't configured MCF to use PostgreSQL, then you are using
>>>>>> Derby, which is not recommended for production use.
>>>>>>
>>>>>> Instructions on how to set up MCF to use PostgreSQL are available
on
>>>>>> the MCF site on the how-to-build-and-deploy page.  Configuring PostgreSQL
>>>>>> for millions or tens of millions of documents will require someone
to learn
>>>>>> about PostgreSQL and how to administer it.  The how-to-build-and-deploy
>>>>>> page provides some (old) guidelines and hints, but if I were you
I'd read
>>>>>> the postgresql manual for the version you install.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 17, 2014 at 1:04 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Ooh ok.
>>>>>>>
>>>>>>> Actually i have never configured PostgreySQL yet. i am simply
using
>>>>>>> binary distribution of MCF to configure file system connectors
to connect
>>>>>>> to Solr.
>>>>>>>
>>>>>>> Do i need to configure PostgreySQL?? How can i proceed from here
to
>>>>>>> check performance measurements?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Ameya
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Jul 17, 2014 at 12:10 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Yes.  Also have a look at the how-to-build-and-deploy page
for
>>>>>>>> hints on how to configure PostgreSQL for maximum performance.
>>>>>>>>
>>>>>>>> ManifoldCF's performance is almost entirely based on the
database.
>>>>>>>> If you are using PostgreSQL, which is the fastest ManifoldCF
choice, you
>>>>>>>> should be able to see in the logs when queries take a long
time, or when
>>>>>>>> indexes are automatically rebuilt.  Could you provide any
information as to
>>>>>>>> what your overall system setup looks like?
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware <
>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>
>>>>>>>>> This page?
>>>>>>>>>
>>>>>>>>> Ameya
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ameya,
>>>>>>>>>>
>>>>>>>>>> Have you read the performance page?
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>> Sent from my Windows Phone
>>>>>>>>>> ------------------------------
>>>>>>>>>> From: Ameya Aware
>>>>>>>>>> Sent: 7/17/2014 11:27 AM
>>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>>> Subject: Performance issues
>>>>>>>>>>
>>>>>>>>>> Hi
>>>>>>>>>>
>>>>>>>>>> I have millions of documents to crawl and send them
to Solr.
>>>>>>>>>>
>>>>>>>>>> But when i run it for thousands documents, it takes
too much time
>>>>>>>>>> for it or sometimes it even hangs up.
>>>>>>>>>>
>>>>>>>>>> So what could be the way to reduce the performance
time?
>>>>>>>>>>
>>>>>>>>>> Also, i do not need content of the documents, i just
need
>>>>>>>>>> metadata, so can i skip content part from reading
and fetching and will
>>>>>>>>>> that improve performance time?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ameya
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message