nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zuber (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl
Date Fri, 07 Oct 2016 06:43:20 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15554298#comment-15554298
] 

Zuber commented on NUTCH-2319:
------------------------------

Thanks Markus, I am trying to upgrade with 1.12.  In 1.4 I was using org.apache.nutch.crawl.Crawl
class from my java code to start crawling. But in 1.12 there is no such class. Could you please
tell what is the alternative in 1.12? Or could you please provide some guide/wiki how to use
1.12 with java?

> Link with "rel=alternate" doesn't return in crawl 
> --------------------------------------------------
>
>                 Key: NUTCH-2319
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2319
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Zuber
>
> I am using nutch-1.4. I am getting the issue that the nutch doesn't return the URLs from
the link rel="alternate".
>  For example, I am trying to crawl the URL  http://rssfeeds.azcentral.com/phoenix/asu
which contains the  below link which I am not getting as result.
> <link rel="alternate" type="application/atom+xml" href="http://rssfeeds.azcentral.com/phoenix/asu&amp;x=1"
title="Phoenix - ASU">
> Could you please help



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message