manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wunderlich, Tobias" <tobias.wunderl...@igd-r.fraunhofer.de>
Subject MCF 0.3 - WebCrawlerConnector - Ingestion Problems
Date Thu, 06 Oct 2011 11:18:02 GMT
Hey guys,

I try to crawl a website generated with a Mediawiki-extension and always get the message:

"[WebcrawlerConnector.java:1312] - WEB: Decided not to ingest 'http://wiki.<host>/index.php?title=Spezial%3AAlle+Seiten&from=p&to=s&namespace=0'
because it did not match ingestability criteria"

Seed-url: 'http://wiki.<host>/index.php?title=Spezial%3AAlle+Seiten&from=p&to=s&namespace=0
Inclusions (crawl and index): .*
Exclusions: none

Other sites are crawled without problems, so I'm wondering what those ingestability criteria
exactly are.

Best regards,
Tobias


Mime
View raw message