manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: Performance issues
Date Fri, 18 Jul 2014 19:17:55 GMT
If you make changes to the code, of course you have to rebuild.  It is up
to you to preserve your configuration and deployment should you do that.

I will give you one hint though: if you are changing connector code only,
you can just build the connector.  From the connector directory, type "ant
deliver-connector" and the connector will be copied into the right place in
the distribution.

Karl



On Fri, Jul 18, 2014 at 3:12 PM, Ameya Aware <ameya.aware@gmail.com> wrote:

> So if i make any changes to code, is there a need of issuing 'ant build'
> command or i can simply restart the server for changes to take place?
>
>
> On Fri, Jul 18, 2014 at 3:10 PM, Karl Wright <daddywri@gmail.com> wrote:
>
>> Hi Ameya,
>>
>> Rebuilding will of course set your properties back to the build defaults.
>>
>> Karl
>>
>>
>>
>> On Fri, Jul 18, 2014 at 3:08 PM, Ameya Aware <ameya.aware@gmail.com>
>> wrote:
>>
>>> Hi
>>>
>>> Am i not supposed to run 'ant build' command after changing
>>> properties.xml file?
>>>
>>> Because that is what set my configured PostgreSQL back to derby
>>>
>>> Ameya
>>>
>>>
>>> On Fri, Jul 18, 2014 at 2:27 PM, Karl Wright <daddywri@gmail.com> wrote:
>>>
>>>> Yes.
>>>> Karl
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 2:26 PM, Ameya Aware <ameya.aware@gmail.com>
>>>> wrote:
>>>>
>>>>> So for Hop filters tab:
>>>>> [image: Inline image 1]
>>>>>
>>>>> are you suggesting to choose 3rd option i.e. "Keep unreachable
>>>>> documents,forever"?
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Ameya
>>>>>
>>>>>
>>>>> On Fri, Jul 18, 2014 at 2:15 PM, Karl Wright <daddywri@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Something else you should be aware of: Hop-count filtering is very
>>>>>> expensive.  If you are using a connector that uses it, and you don't
need
>>>>>> it, you should consider disabling it.  Pick the bottom radio button
on the
>>>>>> Hop Count tab to do that.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Jul 18, 2014 at 1:34 PM, Karl Wright <daddywri@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Ameya,
>>>>>>>
>>>>>>> If you are still using Derby, which apparently you are according
to
>>>>>>> the stack trace, then a pause of 420 seconds is likely because
Derby got
>>>>>>> itself stuck.  Derby is like that which is why we don't recommend
it for
>>>>>>> production.
>>>>>>>
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Jul 18, 2014 at 1:31 PM, Ameya Aware <ameya.aware@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> No Karl,
>>>>>>>>
>>>>>>>> I did not do VACUUM here.
>>>>>>>>
>>>>>>>> Why would queries stopped after running for about 420 sec?
is it
>>>>>>>> because of the errors coming in?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Jul 18, 2014 at 12:32 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ameya,
>>>>>>>>>
>>>>>>>>> For future reference, when you see stuff like this in
the log:
>>>>>>>>>
>>>>>>>>> >>>>>>
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -
Found a
>>>>>>>>> long-running query (458934 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - Found
a
>>>>>>>>> long-running query (420965 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -
  Parameter
>>>>>>>>> 0: 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -
Found a
>>>>>>>>> long-running query (421120 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -
Found a
>>>>>>>>> long-running query (420985 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -
Found a
>>>>>>>>> long-running query (421173 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - 
 Parameter 0:
>>>>>>>>> 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -
  Parameter
>>>>>>>>> 0: 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -
  Parameter
>>>>>>>>> 0: 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -
  Parameter
>>>>>>>>> 1: '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -
  Parameter
>>>>>>>>> 0: 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -
  Parameter
>>>>>>>>> 2: '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '10') -
  Parameter
>>>>>>>>> 1: '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '22') -
Found a
>>>>>>>>> long-running query (421052 ms): [UPDATE hopcount SET
deathmark=?,distance=?
>>>>>>>>> WHERE id IN(SELECT ownerid FROM hopdeletedeps t0 WHERE
t0.jobid=? AND
>>>>>>>>> t0.childidhash=? AND EXISTS(SELECT 'x' FROM intrinsiclink
t1 WHERE
>>>>>>>>> t1.jobid=t0.jobid AND t1.linktype=t0.linktype AND
>>>>>>>>> t1.parentidhash=t0.parentidhash AND t1.childidhash=t0.childidhash
AND
>>>>>>>>> t1.isnew=?))]
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '11') -
  Parameter
>>>>>>>>> 1: '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '4') - 
 Parameter 1:
>>>>>>>>> '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -
  Parameter
>>>>>>>>> 2: '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -
  Parameter
>>>>>>>>> 0: 'D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -
  Parameter
>>>>>>>>> 2: '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '39') -
  Parameter
>>>>>>>>> 3: '9ABFEB709B646CD0C84B4B7B6300E2C9BD5E3477'
>>>>>>>>>  WARN 2014-07-18 11:19:36,505 (Worker thread '19') -
  Parameter
>>>>>>>>> 1: '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '39') -
  Parameter
>>>>>>>>> 4: 'B'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -
  Parameter
>>>>>>>>> 3: 'A932EC77CEF156EA26A4239F12BAB365E6B4F58D'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -
  Parameter
>>>>>>>>> 1: '-1'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -
  Parameter
>>>>>>>>> 3: '9DFF75EBE13D0AAE8AFF025E992C68AB203ED1CB'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') - 
 Parameter 2:
>>>>>>>>> '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '11') -
  Parameter
>>>>>>>>> 4: 'B'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -
  Parameter
>>>>>>>>> 2: '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -
  Parameter
>>>>>>>>> 3: '023FDBD3638711F4E55A918B862A064161B0892A'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '22') -
  Parameter
>>>>>>>>> 4: 'B'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '10') -
  Parameter
>>>>>>>>> 4: 'B'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -
  Parameter
>>>>>>>>> 2: '1405692432586'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') - 
 Parameter 3:
>>>>>>>>> '0158B8EDFEE3DDB10113B6D6E378D5FBF165E1FD'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -
  Parameter
>>>>>>>>> 3: 'FD9641C67D0C1EC22B5F05671513D4DD71B4582C'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '4') - 
 Parameter 4:
>>>>>>>>> 'B'
>>>>>>>>>  WARN 2014-07-18 11:19:36,506 (Worker thread '19') -
  Parameter
>>>>>>>>> 4: 'B'
>>>>>>>>> <<<<<<
>>>>>>>>>
>>>>>>>>> ... it means that MANY queries basically stopped running
for about
>>>>>>>>> 420 seconds.  I bet you did a VACUUM then, right?
>>>>>>>>>
>>>>>>>>> Karl
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Jul 18, 2014 at 12:30 PM, Karl Wright <daddywri@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Ameya,
>>>>>>>>>>
>>>>>>>>>> The log file is full of errors of all sorts.  For
example:
>>>>>>>>>>
>>>>>>>>>> >>>>>
>>>>>>>>>>  WARN 2014-07-17 17:32:38,709 (Worker thread '41')
- IO exception
>>>>>>>>>> during indexing
>>>>>>>>>> file:/C:/Program%20Files/eclipse/configuration/org.eclipse.osgi/.manager/.tmp2043698995563843992.instance:
>>>>>>>>>> The process cannot access the file because another
process has locked a
>>>>>>>>>> portion of the file
>>>>>>>>>> java.io.IOException: The process cannot access the
file because
>>>>>>>>>> another process has locked a portion of the file
>>>>>>>>>>     at java.io.FileInputStream.readBytes(Native Method)
>>>>>>>>>>     at java.io.FileInputStream.read(Unknown Source)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.entity.mime.content.InputStreamBody.writeTo(InputStreamBody.java:91)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.doWriteTo(ModifiedHttpMultipart.java:211)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpMultipart.writeTo(ModifiedHttpMultipart.java:229)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedMultipartEntity.writeTo(ModifiedMultipartEntity.java:187)
>>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.execchain.RequestEntityExecHandler.invoke(RequestEntityExecHandler.java:77)
>>>>>>>>>>     at com.sun.proxy.$Proxy0.writeTo(Unknown Source)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.DefaultBHttpClientConnection.sendRequestEntity(DefaultBHttpClientConnection.java:155)
>>>>>>>>>>     at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
>>>>>>>>>> Source)
>>>>>>>>>>     at java.lang.reflect.Method.invoke(Unknown Source)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.conn.CPoolProxy.invoke(CPoolProxy.java:138)
>>>>>>>>>>     at com.sun.proxy.$Proxy1.sendRequestEntity(Unknown
Source)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.doSendRequest(HttpRequestExecutor.java:236)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:254)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:195)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:108)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:186)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:106)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:57)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.ModifiedHttpSolrServer.request(ModifiedHttpSolrServer.java:292)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
>>>>>>>>>>     at
>>>>>>>>>> org.apache.manifoldcf.agents.output.solr.HttpPoster$IngestThread.run(HttpPoster.java:951)
>>>>>>>>>> <<<<<
>>>>>>>>>>
>>>>>>>>>> This error occurs because you are trying to index
a file on
>>>>>>>>>> Windows that is open by an application.  If you do
this kind of thing,
>>>>>>>>>> ManifoldCF will requeue the document and will try
it again later -- say, in
>>>>>>>>>> 5 minutes, and keep retrying it for many hours before
it gives up.
>>>>>>>>>>
>>>>>>>>>> I suspect that you are not seeing "hangs", but rather
situations
>>>>>>>>>> where MCF is simply waiting for a problem to resolve.
>>>>>>>>>>
>>>>>>>>>> Karl
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Fri, Jul 18, 2014 at 11:27 AM, Ameya Aware <
>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Attaching log file
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:15 AM, Karl Wright
<
>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Also, please send the file logs/manifoldcf.log
as well -- as a
>>>>>>>>>>>> text file.
>>>>>>>>>>>>
>>>>>>>>>>>> Karl
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:12 AM, Karl Wright
<
>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you please get a thread dump and
send that to me?
>>>>>>>>>>>>> Please send as a text file not a screen
shot.
>>>>>>>>>>>>>
>>>>>>>>>>>>> To get a thread dump, get the process
ID of the agents
>>>>>>>>>>>>> process, and use the jdk's jstack utility
to obtain the dump.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:08 AM, Ameya
Aware <
>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> yeah.. i thought so that it should
not effect in 4000
>>>>>>>>>>>>>> documents.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am using filesystem connector to
crawl all of my C drive
>>>>>>>>>>>>>> and output connection is null.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There are no error logs in MCF. MCF
is standstill at same
>>>>>>>>>>>>>> screen since half an hour.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Attaching some snapshots for your
reference.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 11:02 AM,
Karl Wright <
>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Ameya,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 4000 documents is nothing at
all.  We have load tests which
>>>>>>>>>>>>>>> I run on every release that include
more than 100000 documents on a crawl.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Can you be more specific about
the case that you say "hung
>>>>>>>>>>>>>>> up"?  Specifically:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> (1) What kind of crawl is this?
 SharePoint?  Web?
>>>>>>>>>>>>>>> (2) Are there any errors in the
manifoldcf log?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Fri, Jul 18, 2014 at 10:59
AM, Ameya Aware <
>>>>>>>>>>>>>>> ameya.aware@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Karl,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I spent some time going through
PostgreSQL 9.3 manual.
>>>>>>>>>>>>>>>> I configured PostgreSQL for
MCF and saw the significant
>>>>>>>>>>>>>>>> change in performance time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I ran it yesterday for some
4000 documents. When i started
>>>>>>>>>>>>>>>> running again today, the
performance was very poor and after 200 documents,
>>>>>>>>>>>>>>>> it hung up.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Is it because of periodic
maintenance it needs?  Also, i
>>>>>>>>>>>>>>>> would want to know where
and how exactly VACUUM FULL
>>>>>>>>>>>>>>>> command needs to be used?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014 at 2:13
PM, Karl Wright <
>>>>>>>>>>>>>>>> daddywri@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It is fine; I am running
Postgresql 9.3 here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 2:08 PM, Ameya Aware <
>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> is PostgreySQL 9.3
version good because i already have it
>>>>>>>>>>>>>>>>>> in my machine.. Though
documentation says "ManifoldCF
>>>>>>>>>>>>>>>>>> has been tested against
version 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Thu, Jul 17, 2014
at 1:09 PM, Karl Wright <
>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If you haven't
configured MCF to use PostgreSQL, then
>>>>>>>>>>>>>>>>>>> you are using
Derby, which is not recommended for production use.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Instructions
on how to set up MCF to use PostgreSQL are
>>>>>>>>>>>>>>>>>>> available on
the MCF site on the how-to-build-and-deploy page.  Configuring
>>>>>>>>>>>>>>>>>>> PostgreSQL for
millions or tens of millions of documents will require
>>>>>>>>>>>>>>>>>>> someone to learn
about PostgreSQL and how to administer it.  The
>>>>>>>>>>>>>>>>>>> how-to-build-and-deploy
page provides some (old) guidelines and hints, but
>>>>>>>>>>>>>>>>>>> if I were you
I'd read the postgresql manual for the version you install.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 17,
2014 at 1:04 PM, Ameya Aware <
>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ooh ok.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Actually
i have never configured PostgreySQL yet. i am
>>>>>>>>>>>>>>>>>>>> simply using
binary distribution of MCF to configure file system connectors
>>>>>>>>>>>>>>>>>>>> to connect
to Solr.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Do i need
to configure PostgreySQL?? How can i proceed
>>>>>>>>>>>>>>>>>>>> from here
to check performance measurements?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Thu, Jul
17, 2014 at 12:10 PM, Karl Wright <
>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Yes.
 Also have a look at the how-to-build-and-deploy
>>>>>>>>>>>>>>>>>>>>> page
for hints on how to configure PostgreSQL for maximum performance.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ManifoldCF's
performance is almost entirely based on
>>>>>>>>>>>>>>>>>>>>> the database.
 If you are using PostgreSQL, which is the fastest ManifoldCF
>>>>>>>>>>>>>>>>>>>>> choice,
you should be able to see in the logs when queries take a long
>>>>>>>>>>>>>>>>>>>>> time,
or when indexes are automatically rebuilt.  Could you provide any
>>>>>>>>>>>>>>>>>>>>> information
as to what your overall system setup looks like?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Karl
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Thu,
Jul 17, 2014 at 11:32 AM, Ameya Aware <
>>>>>>>>>>>>>>>>>>>>> ameya.aware@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/performance-tuning.html
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This
page?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Ameya
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On
Thu, Jul 17, 2014 at 11:28 AM, Karl Wright <
>>>>>>>>>>>>>>>>>>>>>> daddywri@gmail.com>
wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hi Ameya,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Have you read the performance page?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Karl
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Sent from my Windows Phone
>>>>>>>>>>>>>>>>>>>>>>>
------------------------------
>>>>>>>>>>>>>>>>>>>>>>>
From: Ameya Aware
>>>>>>>>>>>>>>>>>>>>>>>
Sent: 7/17/2014 11:27 AM
>>>>>>>>>>>>>>>>>>>>>>>
To: user@manifoldcf.apache.org
>>>>>>>>>>>>>>>>>>>>>>>
Subject: Performance issues
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Hi
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
I have millions of documents to crawl and send them
>>>>>>>>>>>>>>>>>>>>>>>
to Solr.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
But when i run it for thousands documents, it takes
>>>>>>>>>>>>>>>>>>>>>>>
too much time for it or sometimes it even hangs up.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
So what could be the way to reduce the performance
>>>>>>>>>>>>>>>>>>>>>>>
time?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Also, i do not need content of the documents, i just
>>>>>>>>>>>>>>>>>>>>>>>
need metadata, so can i skip content part from reading and fetching and
>>>>>>>>>>>>>>>>>>>>>>>
will that improve performance time?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
Ameya
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Mime
View raw message