manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Aware <ameya.aw...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 19:08:26 GMT
Hi

Am i not supposed to run 'ant build' command after changing properties.xml
file?

Because that is what set my configured PostgreSQL back to derby

Ameya


On Fri, Jul 18, 2014 at 2:27 PM, Karl Wright <daddywri@gmail.com> wrote:

> Yes.
> Karl
>
>
> On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com>
> wrote:
>
>> So for Hop filters tab:
>> [image: Inline image 1]
>>
>> are you suggesting to choose 3rd option i.e. "Keep unreachable
>> documents,forever"?
>>
>>
>> Thanks,
>> Ameya
>>
>>
>> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Something else you should be aware of: Hop-count filtering is very
>>> expensive.  If you are using a connector that uses it, and you don't need
>>> it, you should consider disabling it.  Pick the bottom radio button on the
>>> Hop Count tab to do that.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Hi Ameya,
>>>>
>>>> If you are still using Derby, which apparently you are according to the
>>>> stack trace, then a pause of 420 seconds is likely because Derby got itself
>>>> stuck.  Derby is like that which is why we don't recommend it for
>>>> production.
>>>>
>>>> Karl
>>>>
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com>
>>>> wrote:
>>>>
>>>>> No Karl,
>>>>>
>>>>> I did not do VACUUM here.
>>>>>
>>>>> Why would queries stopped after running for about 420 sec? is it
>>>>> because of the errors coming in?
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ameya,
>>>>>>
>>>>>> For future reference, when you see stuff like this in the log:
>>>>>>
>>>>>> >>>>>>
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') - Found a
>>>>>> long-running query (458934 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - Found a
>>>>>> long-running query (420965 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
0:
>>>>>> 'D'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') - Found a
>>>>>> long-running query (421120 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') - Found a
>>>>>> long-running query (420985 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') - Found a
>>>>>> long-running query (421173 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter 0:
'D'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
0:
>>>>>> 'D'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
0:
>>>>>> 'D'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
0:
>>>>>> 'D'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22') - Found a
>>>>>> long-running query (421052 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink t1 WHERE
>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>> t1.isnew=?))]
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter 1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
0:
>>>>>> 'D'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
3:
>>>>>> '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39') -   Parameter
4:
>>>>>> 'B'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
3:
>>>>>> 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
1:
>>>>>> '-1'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
3:
>>>>>> '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
4:
>>>>>> 'B'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
3:
>>>>>> '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
4:
>>>>>> 'B'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
4:
>>>>>> 'B'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
2:
>>>>>> '1405692432586'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 3:
>>>>>> '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
3:
>>>>>> 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter 4:
'B'
>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
4:
>>>>>> 'B'
>>>>>> <<<<<<
>>>>>>
>>>>>> ... it means that MANY queries basically stopped running for about
>>>>>> 420 seconds.  I bet you did a VACUUM then, right?
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ameya,
>>>>>>>
>>>>>>> The log file is full of errors of all sorts.  For example:
>>>>>>>
>>>>>>> >>>>>
>>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread '41') - IO exception
>>>>>>> during indexing
>>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>>> The process cannot access the file because another process has
locked a
>>>>>>> portion of the file
>>>>>>> java.io.IOException: The process cannot access the file because
>>>>>>> another process has locked a portion of the file
>>>>>>>     at java.io.FileInputStream.readBytes(Native Method)
>>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>> Source)
>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>> Source)
>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown Source)
>>>>>>>     at
>>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>>     at
>>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>>     at
>>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>>     at
>>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>>     at
>>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>>     at
>>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>>     at
>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>>     at
>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>>     at
>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>>     at
>>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>>     at
>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>>     at
>>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>>> <<<<<
>>>>>>>
>>>>>>> This error occurs because you are trying to index a file on Windows
>>>>>>> that is open by an application.  If you do this kind of thing,
ManifoldCF
>>>>>>> will requeue the document and will try it again later -- say,
in 5 minutes,
>>>>>>> and keep retrying it for many hours before it gives up.
>>>>>>>
>>>>>>> I suspect that you are not seeing "hangs", but rather situations
>>>>>>> where MCF is simply waiting for a problem to resolve.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware <ameya.aware@gmail.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> Attaching log file
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Also, please send the file logs/manifoldcf.log as well
-- as a
>>>>>>>>> text file.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Could you please get a thread dump and send that
to me?  Please
>>>>>>>>>> send as a text file not a screen shot.
>>>>>>>>>>
>>>>>>>>>> To get a thread dump, get the process ID of the agents
process,
>>>>>>>>>> and use the jdk's jstack utility to obtain the dump.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware <
>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> yeah.. i thought so that it should not effect
in 4000 documents.
>>>>>>>>>>>
>>>>>>>>>>> I am using filesystem connector to crawl all
of my C drive and
>>>>>>>>>>> output connection is null.
>>>>>>>>>>>
>>>>>>>>>>> There are no error logs in MCF. MCF is standstill
at same screen
>>>>>>>>>>> since half an hour.
>>>>>>>>>>>
>>>>>>>>>>> Attaching some snapshots for your reference.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Ameya
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02 AM, Karl Wright
<
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>
>>>>>>>>>>>> 4000 documents is nothing at all.  We have
load tests which I
>>>>>>>>>>>> run on every release that include more than
100000 documents on a crawl.
>>>>>>>>>>>>
>>>>>>>>>>>> Can you be more specific about the case that
you say "hung
>>>>>>>>>>>> up"?  Specifically:
>>>>>>>>>>>>
>>>>>>>>>>>> (1) What kind of crawl is this?  SharePoint?
 Web?
>>>>>>>>>>>> (2) Are there any errors in the manifoldcf
log?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59 AM, Ameya Aware
<
>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I spent some time going through PostgreSQL
9.3 manual.
>>>>>>>>>>>>> I configured PostgreSQL for MCF and saw
the significant change
>>>>>>>>>>>>> in performance time.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I ran it yesterday for some 4000 documents.
When i started
>>>>>>>>>>>>> running again today, the performance
was very poor and after 200 documents,
>>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is it because of periodic maintenance
it needs?  Also, i would
>>>>>>>>>>>>> want to know where and how exactly VACUUM
FULL command needs
>>>>>>>>>>>>> to be used?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:13 PM, Karl
Wright <
>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is fine; I am running Postgresql
9.3 here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:08 PM,
Ameya Aware <
>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> is PostgreySQL 9.3 version good
because i already have it in
>>>>>>>>>>>>>>> my machine.. Though documentation
says "ManifoldCF has been
>>>>>>>>>>>>>>> tested against version 8.3.7,
8.4.5 and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 1:09
PM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If you haven't configured
MCF to use PostgreSQL, then you
>>>>>>>>>>>>>>>> are using Derby, which is
not recommended for production use.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Instructions on how to set
up MCF to use PostgreSQL are
>>>>>>>>>>>>>>>> available on the MCF site
on the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>>> PostgreSQL for millions or
tens of millions of documents will require
>>>>>>>>>>>>>>>> someone to learn about PostgreSQL
and how to administer it.  The
>>>>>>>>>>>>>>>> how-to-build-and-deploy page
provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>>> if I were you I'd read the
postgresql manual for the version you install.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 1:04
PM, Ameya Aware <
>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Actually i have never
configured PostgreySQL yet. i am
>>>>>>>>>>>>>>>>> simply using binary distribution
of MCF to configure file system connectors
>>>>>>>>>>>>>>>>> to connect to Solr.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Do i need to configure
PostgreySQL?? How can i proceed
>>>>>>>>>>>>>>>>> from here to check performance
measurements?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 12:10 PM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Yes.  Also have a
look at the how-to-build-and-deploy
>>>>>>>>>>>>>>>>>> page for hints on
how to configure PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ManifoldCF's performance
is almost entirely based on the
>>>>>>>>>>>>>>>>>> database.  If you
are using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>>> choice, you should
be able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>>> time, or when indexes
are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>>> information as to
what your overall system setup looks like?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> This page?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 17,
2014 at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Have you
read the performance page?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sent from
my Windows Phone
>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>> From: Ameya
Aware
>>>>>>>>>>>>>>>>>>>> Sent: 7/17/2014
11:27 AM
>>>>>>>>>>>>>>>>>>>> To: user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>>> Subject:
Performance issues
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I have millions
of documents to crawl and send them to
>>>>>>>>>>>>>>>>>>>> Solr.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> But when
i run it for thousands documents, it takes too
>>>>>>>>>>>>>>>>>>>> much time
for it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> So what could
be the way to reduce the performance
>>>>>>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Also, i do
not need content of the documents, i just
>>>>>>>>>>>>>>>>>>>> need metadata,
so can i skip content part from reading and fetching and
>>>>>>>>>>>>>>>>>>>> will that
improve performance time?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message