manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Schuch" <markus_sch...@web.de>
Subject webcrawler connector and dns lookups behind corporate http proxy
Date Mon, 10 Oct 2016 19:44:02 GMT
Hi @ the lovely mcf community out there,
 
in our setup we run manifoldcf (2.3) behind a corporate http proxy server and we try to crawl
specific web pages in the internet.
 
We run into java.net.UnknownHostException because the connector tries to resolve the ip of
the hostname. This fails, because our network setup does not allow direct dns lookups for
internet pages and the JDKs InetAddress.getByName() call relies on the systems dns lookup
mechanisms. All internet traffic goes through the corporate http proxy server which does all
necessary dns resolution on his side.
 
Can you think of any other (more elegant) solution besides adding the records to /etc/hosts
on the crawlers machine?
 
Many thanks in advance,
Markus
 
 

Mime
View raw message