manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Aware <ameya.aw...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 19:32:28 GMT
cool.. working perfect now.

When do i really have to look into VACUUM FULL command?

Where and how this command needs to be executed?


On Fri, Jul 18, 2014 at 3:17 PM, Karl Wright <daddywri@gmail.com> wrote:

> If you make changes to the code, of course you have to rebuild.  It is up
> to you to preserve your configuration and deployment should you do that.
>
> I will give you one hint though: if you are changing connector code only,
> you can just build the connector.  From the connector directory, type "ant
> deliver-connector" and the connector will be copied into the right place in
> the distribution.
>
> Karl
>
>
>
> On Fri, Jul 18, 2014 at 3:12 PM, Ameya Aware <ameya.aware@gmail.com>
> wrote:
>
>> So if i make any changes to code, is there a need of issuing 'ant build'
>> command or i can simply restart the server for changes to take place?
>>
>>
>> On Fri, Jul 18, 2014 at 3:10 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Hi Ameya,
>>>
>>> Rebuilding will of course set your properties back to the build defaults.
>>>
>>> Karl
>>>
>>>
>>>
>>> On Fri, Jul 18, 2014 at 3:08 PM, Ameya Aware <ameya.aware@gmail.com>
>>> wrote:
>>>
>>>> Hi
>>>>
>>>> Am i not supposed to run 'ant build' command after changing
>>>> properties.xml file?
>>>>
>>>> Because that is what set my configured PostgreSQL back to derby
>>>>
>>>> Ameya
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:27 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Yes.
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> So for Hop filters tab:
>>>>>> [image: Inline image 1]
>>>>>>
>>>>>> are you suggesting to choose 3rd option i.e. "Keep unreachable
>>>>>> documents,forever"?
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Ameya
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Something else you should be aware of: Hop-count filtering is
very
>>>>>>> expensive.  If you are using a connector that uses it, and you
don't need
>>>>>>> it, you should consider disabling it.  Pick the bottom radio
button on the
>>>>>>> Hop Count tab to do that.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ameya,
>>>>>>>>
>>>>>>>> If you are still using Derby, which apparently you are according
to
>>>>>>>> the stack trace, then a pause of 420 seconds is likely because
Derby got
>>>>>>>> itself stuck.  Derby is like that which is why we don't recommend
it for
>>>>>>>> production.
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com
>>>>>>>> > wrote:
>>>>>>>>
>>>>>>>>> No Karl,
>>>>>>>>>
>>>>>>>>> I did not do VACUUM here.
>>>>>>>>>
>>>>>>>>> Why would queries stopped after running for about 420
sec? is it
>>>>>>>>> because of the errors coming in?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ameya,
>>>>>>>>>>
>>>>>>>>>> For future reference, when you see stuff like this
in the log:
>>>>>>>>>>
>>>>>>>>>> >>>>>>
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39')
- Found a
>>>>>>>>>> long-running query (458934 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4')
- Found a
>>>>>>>>>> long-running query (420965 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19')
- Found a
>>>>>>>>>> long-running query (421120 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10')
- Found a
>>>>>>>>>> long-running query (420985 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11')
- Found a
>>>>>>>>>> long-running query (421173 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22')
- Found a
>>>>>>>>>> long-running query (421052 ms): [UPDATE hopcount
SET deathmark=?,distance=?
>>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0
WHERE t0.jobid=? AND
>>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>>> t1.isnew=?))]
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22')
-   Parameter
>>>>>>>>>> 0: 'D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39')
-   Parameter
>>>>>>>>>> 3: '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10')
-   Parameter
>>>>>>>>>> 3: 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22')
-   Parameter
>>>>>>>>>> 1: '-1'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11')
-   Parameter
>>>>>>>>>> 3: '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22')
-   Parameter
>>>>>>>>>> 3: '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19')
-   Parameter
>>>>>>>>>> 2: '1405692432586'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4')
-   Parameter
>>>>>>>>>> 3: '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19')
-   Parameter
>>>>>>>>>> 3: 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19')
-   Parameter
>>>>>>>>>> 4: 'B'
>>>>>>>>>> <<<<<<
>>>>>>>>>>
>>>>>>>>>> ... it means that MANY queries basically stopped
running for
>>>>>>>>>> about 420 seconds.  I bet you did a VACUUM then,
right?
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>
>>>>>>>>>>> The log file is full of errors of all sorts.
 For example:
>>>>>>>>>>>
>>>>>>>>>>> >>>>>
>>>>>>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread
'41') - IO
>>>>>>>>>>> exception during indexing
>>>>>>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>>>>>>> The process cannot access the file because another
process has locked a
>>>>>>>>>>> portion of the file
>>>>>>>>>>> java.io.IOException: The process cannot access
the file because
>>>>>>>>>>> another process has locked a portion of the file
>>>>>>>>>>>     at java.io.FileInputStream.readBytes(Native
Method)
>>>>>>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>>>>>>>>>>> Source)
>>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>>>> Source)
>>>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown
Source)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown
Source)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
>>>>>>>>>>> Source)
>>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>>>> Source)
>>>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown
Source)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown
Source)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>>>>>>     at
>>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>>>>>>> <<<<<
>>>>>>>>>>>
>>>>>>>>>>> This error occurs because you are trying to index
a file on
>>>>>>>>>>> Windows that is open by an application.  If you
do this kind of thing,
>>>>>>>>>>> ManifoldCF will requeue the document and will
try it again later -- say, in
>>>>>>>>>>> 5 minutes, and keep retrying it for many hours
before it gives up.
>>>>>>>>>>>
>>>>>>>>>>> I suspect that you are not seeing "hangs", but
rather situations
>>>>>>>>>>> where MCF is simply waiting for a problem to
resolve.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware
<
>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Attaching log file
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright
<
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Also, please send the file logs/manifoldcf.log
as well -- as a
>>>>>>>>>>>>> text file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl
Wright <
>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Could you please get a thread dump
and send that to me?
>>>>>>>>>>>>>> Please send as a text file not a
screen shot.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To get a thread dump, get the process
ID of the agents
>>>>>>>>>>>>>> process, and use the jdk's jstack
utility to obtain the dump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM,
Ameya Aware <
>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yeah.. i thought so that it should
not effect in 4000
>>>>>>>>>>>>>>> documents.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am using filesystem connector
to crawl all of my C drive
>>>>>>>>>>>>>>> and output connection is null.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There are no error logs in MCF.
MCF is standstill at same
>>>>>>>>>>>>>>> screen since half an hour.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Attaching some snapshots for
your reference.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02
AM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 4000 documents is nothing
at all.  We have load tests which
>>>>>>>>>>>>>>>> I run on every release that
include more than 100000 documents on a crawl.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Can you be more specific
about the case that you say "hung
>>>>>>>>>>>>>>>> up"?  Specifically:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> (1) What kind of crawl is
this?  SharePoint?  Web?
>>>>>>>>>>>>>>>> (2) Are there any errors
in the manifoldcf log?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59
AM, Ameya Aware <
>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I spent some time going
through PostgreSQL 9.3 manual.
>>>>>>>>>>>>>>>>> I configured PostgreSQL
for MCF and saw the significant
>>>>>>>>>>>>>>>>> change in performance
time.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I ran it yesterday for
some 4000 documents. When i started
>>>>>>>>>>>>>>>>> running again today,
the performance was very poor and after 200 documents,
>>>>>>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Is it because of periodic
maintenance it needs?  Also, i
>>>>>>>>>>>>>>>>> would want to know where
and how exactly VACUUM FULL
>>>>>>>>>>>>>>>>> command needs to be used?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 2:13 PM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It is fine; I am
running Postgresql 9.3 here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 2:08 PM, Ameya Aware <
>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> is PostgreySQL
9.3 version good because i already have
>>>>>>>>>>>>>>>>>>> it in my machine..
Though documentation says "ManifoldCF
>>>>>>>>>>>>>>>>>>> has been tested
against version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 17,
2014 at 1:09 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If you haven't
configured MCF to use PostgreSQL, then
>>>>>>>>>>>>>>>>>>>> you are using
Derby, which is not recommended for production use.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Instructions
on how to set up MCF to use PostgreSQL are
>>>>>>>>>>>>>>>>>>>> available
on the MCF site on the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>>>>>>> PostgreSQL
for millions or tens of millions of documents will require
>>>>>>>>>>>>>>>>>>>> someone to
learn about PostgreSQL and how to administer it.  The
>>>>>>>>>>>>>>>>>>>> how-to-build-and-deploy
page provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>>>>>>> if I were
you I'd read the postgresql manual for the version you install.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul
17, 2014 at 1:04 PM, Ameya Aware <
>>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Actually
i have never configured PostgreySQL yet. i am
>>>>>>>>>>>>>>>>>>>>> simply
using binary distribution of MCF to configure file system connectors
>>>>>>>>>>>>>>>>>>>>> to connect
to Solr.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Do i
need to configure PostgreySQL?? How can i proceed
>>>>>>>>>>>>>>>>>>>>> from
here to check performance measurements?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu,
Jul 17, 2014 at 12:10 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Yes.
 Also have a look at the how-to-build-and-deploy
>>>>>>>>>>>>>>>>>>>>>> page
for hints on how to configure PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ManifoldCF's
performance is almost entirely based on
>>>>>>>>>>>>>>>>>>>>>> the
database.  If you are using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>>>>>>> choice,
you should be able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>>>>>>> time,
or when indexes are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>>>>>>> information
as to what your overall system setup looks like?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On
Thu, Jul 17, 2014 at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
This page?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Ameya
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
On Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>>>
daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Hi Ameya,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Have you read the performance page?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Sent from my Windows Phone
>>>>>>>>>>>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>
From: Ameya Aware
>>>>>>>>>>>>>>>>>>>>>>>>
Sent: 7/17/2014 11:27 AM
>>>>>>>>>>>>>>>>>>>>>>>>
To: user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>>>>>>>
Subject: Performance issues
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Hi
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
I have millions of documents to crawl and send them
>>>>>>>>>>>>>>>>>>>>>>>>
to Solr.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
But when i run it for thousands documents, it takes
>>>>>>>>>>>>>>>>>>>>>>>>
too much time for it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
So what could be the way to reduce the performance
>>>>>>>>>>>>>>>>>>>>>>>>
time?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Also, i do not need content of the documents, i
>>>>>>>>>>>>>>>>>>>>>>>>
just need metadata, so can i skip content part from reading and fetching
>>>>>>>>>>>>>>>>>>>>>>>>
and will that improve performance time?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>
Ameya
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message