I would say that would be pretty efficient.  ManifoldCF will need to keep records in its jobqueue table which correspond to hopcount=2.  It will never fetch these, however.


I am crawling main pages of some online newspaper web sites. 
I don't need deletes at all. I am using crawl once model.

Here is the settings I use : 

Schedule type: Scan every document once
Start Method : Start at beginning of schedule window

Scheduled time: Any day of week at 1 am 3 am 5 am 7 am 9 am 11 am 1 pm 3 pm 5 pm 7 pm 9 pm 11 pm plus 0 minutes
Maximum run time: No limit

Maximum hop count for link type 'link': 1
Maximum hop count for link type 'redirect': Unlimited
Hop count mode: No deletes, forever

Include only hosts matching seeds? yes
Seeds: A few URLs in the form of http://main.page.com/{category} where category is Sports, Politics etc.

By setting hop count to 1 ( or 2) and 'no deletes, forever', I am expecting this crawl to be super fast and most efficient. Minimal DB queries etc. Am I correct?