nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Tang <him...@gmail.com>
Subject RSS Parser Bug!?
Date Thu, 08 Sep 2005 05:58:35 GMT
Hi Guys

Did someone install parse-rss and try to fetch rss feeds?
It failed on my side. I enabled the plugin and it fetched, not rss
parser didnot work.
My feed is http://www.craigslist.org/evs/index.rss

Here is the error:

org.apache.nutch.fetcher.Fetcher$FetcherThread [11] - fetch okay, but
can't parse http://beijing.craigslist.org/jjj/index.rss, reason:
failed(2,203): Content-Type not text/html: application/xml;
charset=ISO-8859-1

The content-type is application/xml. Mattmann's comment is this:
        // check that contentType is one we can handle
        String contentType = content.getContentType();
        if (contentType != null
                && (!contentType.startsWith("text/xml") &&
!contentType.startsWith("application/rss+xml")))
            return new ParseStatus(ParseStatus.FAILED_INVALID_FORMAT,
                    "Content-Type not text/xml or application/rss+xml: "
                            + contentType).getEmptyParse();

So, it does not "application/xml" content type yet?


Thanks
/Jack
-- 
Keep Discovering ... ...
http://www.jroller.com/page/jmars

Mime
View raw message