manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ameya Aware <ameya.aw...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 19:12:41 GMT
So if i make any changes to code, is there a need of issuing 'ant build'
command or i can simply restart the server for changes to take place?


On Fri, Jul 18, 2014 at 3:10 PM, Karl Wright <daddywri@gmail.com> wrote:

> Hi Ameya,
>
> Rebuilding will of course set your properties back to the build defaults.
>
> Karl
>
>
>
> On Fri, Jul 18, 2014 at 3:08 PM, Ameya Aware <ameya.aware@gmail.com>
> wrote:
>
>> Hi
>>
>> Am i not supposed to run 'ant build' command after changing
>> properties.xml file?
>>
>> Because that is what set my configured PostgreSQL back to derby
>>
>> Ameya
>>
>>
>> On Fri, Jul 18, 2014 at 2:27 PM, Karl Wright <daddywri@gmail.com> wrote:
>>
>>> Yes.
>>> Karl
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com>
>>> wrote:
>>>
>>>> So for Hop filters tab:
>>>> [image: Inline image 1]
>>>>
>>>> are you suggesting to choose 3rd option i.e. "Keep unreachable
>>>> documents,forever"?
>>>>
>>>>
>>>> Thanks,
>>>> Ameya
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com>
>>>> wrote:
>>>>
>>>>> Something else you should be aware of: Hop-count filtering is very
>>>>> expensive.  If you are using a connector that uses it, and you don't
need
>>>>> it, you should consider disabling it.  Pick the bottom radio button on
the
>>>>> Hop Count tab to do that.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Ameya,
>>>>>>
>>>>>> If you are still using Derby, which apparently you are according
to
>>>>>> the stack trace, then a pause of 420 seconds is likely because Derby
got
>>>>>> itself stuck.  Derby is like that which is why we don't recommend
it for
>>>>>> production.
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> No Karl,
>>>>>>>
>>>>>>> I did not do VACUUM here.
>>>>>>>
>>>>>>> Why would queries stopped after running for about 420 sec? is
it
>>>>>>> because of the errors coming in?
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Ameya,
>>>>>>>>
>>>>>>>> For future reference, when you see stuff like this in the
log:
>>>>>>>>
>>>>>>>> >>>>>>
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') - Found
a
>>>>>>>> long-running query (458934 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - Found
a
>>>>>>>> long-running query (420965 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') - Found
a
>>>>>>>> long-running query (421120 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') - Found
a
>>>>>>>> long-running query (420985 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') - Found
a
>>>>>>>> long-running query (421173 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22') - Found
a
>>>>>>>> long-running query (421052 ms): [UPDATE hopcount SET deathmark=?,distance=?
>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE t0.jobid=?
AND
>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>> t1.isnew=?))]
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
0:
>>>>>>>> 'D'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -   Parameter
3:
>>>>>>>> '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
3:
>>>>>>>> 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
1:
>>>>>>>> '-1'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
3:
>>>>>>>> '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
3:
>>>>>>>> '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
2:
>>>>>>>> '1405692432586'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
3:
>>>>>>>> '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
3:
>>>>>>>> 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -   Parameter
4:
>>>>>>>> 'B'
>>>>>>>> <<<<<<
>>>>>>>>
>>>>>>>> ... it means that MANY queries basically stopped running
for about
>>>>>>>> 420 seconds.  I bet you did a VACUUM then, right?
>>>>>>>>
>>>>>>>> Karl
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ameya,
>>>>>>>>>
>>>>>>>>> The log file is full of errors of all sorts.  For example:
>>>>>>>>>
>>>>>>>>> >>>>>
>>>>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread '41') -
IO exception
>>>>>>>>> during indexing
>>>>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>>>>> The process cannot access the file because another process
has locked a
>>>>>>>>> portion of the file
>>>>>>>>> java.io.IOException: The process cannot access the file
because
>>>>>>>>> another process has locked a portion of the file
>>>>>>>>>     at java.io.FileInputStream.readBytes(Native Method)
>>>>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>>>>     at
>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>>>>     at
>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>>>>     at
>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
Source)
>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
Source)
>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>> Source)
>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown
Source)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>>>>     at
>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>>>>     at
>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>>>>     at
>>>>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>>>>     at
>>>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>>>>     at
>>>>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>>>>> <<<<<
>>>>>>>>>
>>>>>>>>> This error occurs because you are trying to index a file
on
>>>>>>>>> Windows that is open by an application.  If you do this
kind of thing,
>>>>>>>>> ManifoldCF will requeue the document and will try it
again later -- say, in
>>>>>>>>> 5 minutes, and keep retrying it for many hours before
it gives up.
>>>>>>>>>
>>>>>>>>> I suspect that you are not seeing "hangs", but rather
situations
>>>>>>>>> where MCF is simply waiting for a problem to resolve.
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware <
>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Attaching log file
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright <daddywri@gmail.com
>>>>>>>>>> > wrote:
>>>>>>>>>>
>>>>>>>>>>> Also, please send the file logs/manifoldcf.log
as well -- as a
>>>>>>>>>>> text file.
>>>>>>>>>>>
>>>>>>>>>>> Karl
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright
<
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Could you please get a thread dump and send
that to me?  Please
>>>>>>>>>>>> send as a text file not a screen shot.
>>>>>>>>>>>>
>>>>>>>>>>>> To get a thread dump, get the process ID
of the agents process,
>>>>>>>>>>>> and use the jdk's jstack utility to obtain
the dump.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM, Ameya Aware
<
>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> yeah.. i thought so that it should not
effect in 4000
>>>>>>>>>>>>> documents.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using filesystem connector to crawl
all of my C drive and
>>>>>>>>>>>>> output connection is null.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There are no error logs in MCF. MCF is
standstill at same
>>>>>>>>>>>>> screen since half an hour.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Attaching some snapshots for your reference.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02 AM, Karl
Wright <
>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> 4000 documents is nothing at all.
 We have load tests which I
>>>>>>>>>>>>>> run on every release that include
more than 100000 documents on a crawl.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Can you be more specific about the
case that you say "hung
>>>>>>>>>>>>>> up"?  Specifically:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> (1) What kind of crawl is this? 
SharePoint?  Web?
>>>>>>>>>>>>>> (2) Are there any errors in the manifoldcf
log?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59 AM,
Ameya Aware <
>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I spent some time going through
PostgreSQL 9.3 manual.
>>>>>>>>>>>>>>> I configured PostgreSQL for MCF
and saw the significant
>>>>>>>>>>>>>>> change in performance time.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I ran it yesterday for some 4000
documents. When i started
>>>>>>>>>>>>>>> running again today, the performance
was very poor and after 200 documents,
>>>>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is it because of periodic maintenance
it needs?  Also, i
>>>>>>>>>>>>>>> would want to know where and
how exactly VACUUM FULL
>>>>>>>>>>>>>>> command needs to be used?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:13
PM, Karl Wright <
>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is fine; I am running
Postgresql 9.3 here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:08
PM, Ameya Aware <
>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> is PostgreySQL 9.3 version
good because i already have it
>>>>>>>>>>>>>>>>> in my machine.. Though
documentation says "ManifoldCF has
>>>>>>>>>>>>>>>>> been tested against version
8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 1:09 PM, Karl Wright <
>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> If you haven't configured
MCF to use PostgreSQL, then you
>>>>>>>>>>>>>>>>>> are using Derby,
which is not recommended for production use.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Instructions on how
to set up MCF to use PostgreSQL are
>>>>>>>>>>>>>>>>>> available on the
MCF site on the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>>>>> PostgreSQL for millions
or tens of millions of documents will require
>>>>>>>>>>>>>>>>>> someone to learn
about PostgreSQL and how to administer it.  The
>>>>>>>>>>>>>>>>>> how-to-build-and-deploy
page provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>>>>> if I were you I'd
read the postgresql manual for the version you install.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 1:04 PM, Ameya Aware <
>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Actually i have
never configured PostgreySQL yet. i am
>>>>>>>>>>>>>>>>>>> simply using
binary distribution of MCF to configure file system connectors
>>>>>>>>>>>>>>>>>>> to connect to
Solr.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Do i need to
configure PostgreySQL?? How can i proceed
>>>>>>>>>>>>>>>>>>> from here to
check performance measurements?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 17,
2014 at 12:10 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Yes.  Also
have a look at the how-to-build-and-deploy
>>>>>>>>>>>>>>>>>>>> page for
hints on how to configure PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> ManifoldCF's
performance is almost entirely based on
>>>>>>>>>>>>>>>>>>>> the database.
 If you are using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>>>>> choice, you
should be able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>>>>> time, or
when indexes are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>>>>> information
as to what your overall system setup looks like?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul
17, 2014 at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This
page?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu,
Jul 17, 2014 at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi
Ameya,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Have
you read the performance page?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Sent
from my Windows Phone
>>>>>>>>>>>>>>>>>>>>>> ------------------------------
>>>>>>>>>>>>>>>>>>>>>> From:
Ameya Aware
>>>>>>>>>>>>>>>>>>>>>> Sent:
7/17/2014 11:27 AM
>>>>>>>>>>>>>>>>>>>>>> To:
user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>>>>> Subject:
Performance issues
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I
have millions of documents to crawl and send them
>>>>>>>>>>>>>>>>>>>>>> to
Solr.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> But
when i run it for thousands documents, it takes
>>>>>>>>>>>>>>>>>>>>>> too
much time for it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> So
what could be the way to reduce the performance
>>>>>>>>>>>>>>>>>>>>>> time?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Also,
i do not need content of the documents, i just
>>>>>>>>>>>>>>>>>>>>>> need
metadata, so can i skip content part from reading and fetching and
>>>>>>>>>>>>>>>>>>>>>> will
that improve performance time?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message