manifoldcf-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@gmail.com>
Subject Re: RSS Crawler error 403
Date Thu, 14 May 2015 14:02:26 GMT
Hi Andrea,

It sounds like you may have gotten blocked by the webmaster at nypost.
Hopefully they haven't blocked all accesses from the ManifoldCF crawler in
general, but just from your IP address.

curl on that url works fine from here.  As does MCF when I configure it to
use your url.

The other possibility is that you are trying to crawl through a proxy, and
that's not set up properly.

Karl



On Thu, May 14, 2015 at 8:50 AM, Andrea Asta <asta.andrea@gmail.com> wrote:

> Hi,
> I'm trying to setup a job for crawling some RSS feeds.
> A lot of feeds don't produce anything and looking at the simple history
> they return an error 403.
>
> An example of feed:
> http://nypost.com/news/feed/
>
> How can I manage this situation?
>
> Thank you.
> Andrea
>

Mime
View raw message