manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: stuffamountfactor and getting more work done
Date Fri, 12 Dec 2014 17:51:58 GMT
Hi Aeham,

Given that your stuffer thread has to wait for multiple other machines to
finish stuffing before it runs, it may make sense to increase the amount
stuffed at one time.  Unfortunately the stuffer lock has to remain because
otherwise the same document could be stuffed twice.  Using a database
transaction is unworkable in this context because of the tendency to
deadlock.

Thanks,
Karl


On Fri, Dec 12, 2014 at 12:46 PM, Aeham Abushwashi <
aeham.abushwashi@exonar.com> wrote:
>
> Thanks Karl.
>
> The stuffer thread query isn't doing too badly. Judging by stats from the
> pg_stat_activity table in postgresql, the stuffer query usually takes < 2
> seconds to return.
>
>
> >> In a continuous job, documents may well be scheduled to be crawled at
> some time in the future, and are ineligible for crawling until that future
> time arrives.
>
> Such documents would be excluded by the stuffer query, right?
>
> Thanks for the pointer to the queue status page. Using the root server
> name as an identifier class, I get the bulk of documents grouped under the
> "About to Process" and "Waiting for Processing" categories. For example, I
> have a job with 677,856 and 102,342 docs respecitvely. Another job has
> 320,804 and 443,596 doc respectively. All other status categories have 0
> docs.
>
>
> >>  If there are tons of idle worker threads AND your stuffer thread is
> waiting on Postgresql, that's a good sign it is not keeping up due to
> database reasons.
>
> Interestingly, the stuffer thread spends the majority of its time trying
> to acquire the stuffer lock. I have 3 nodes in the cluster and each node's
> stuffer thread spends ~ 2/3 of its time blocked waiting for the lock. Of
> course the SQL query itself and connection grabbing/releasing all happen
> within the scope of the lock. The effect is that the more nodes there are
> in the cluster, the less time each node has for stuffing documents.
>

Mime
View raw message