From Aeham Abushwashi <>
Subject Re: stuffamountfactor and getting more work done
Date Fri, 12 Dec 2014 17:46:34 GMT
Thanks Karl.

The stuffer thread query isn't doing too badly. Judging by stats from the
pg_stat_activity table in postgresql, the stuffer query usually takes < 2
seconds to return.

>> In a continuous job, documents may well be scheduled to be crawled at
some time in the future, and are ineligible for crawling until that future
time arrives.

Such documents would be excluded by the stuffer query, right?

Thanks for the pointer to the queue status page. Using the root server name
as an identifier class, I get the bulk of documents grouped under the
"About to Process" and "Waiting for Processing" categories. For example, I
have a job with 677,856 and 102,342 docs respecitvely. Another job has
320,804 and 443,596 doc respectively. All other status categories have 0

>>  If there are tons of idle worker threads AND your stuffer thread is
waiting on Postgresql, that's a good sign it is not keeping up due to
database reasons.

Interestingly, the stuffer thread spends the majority of its time trying to
acquire the stuffer lock. I have 3 nodes in the cluster and each node's
stuffer thread spends ~ 2/3 of its time blocked waiting for the lock. Of
course the SQL query itself and connection grabbing/releasing all happen
within the scope of the lock. The effect is that the more nodes there are
in the cluster, the less time each node has for stuffing documents.

