nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zuber (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2319) Link with "rel=alternate" doesn't return in crawl
Date Wed, 05 Oct 2016 13:50:20 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15548759#comment-15548759
] 

Zuber commented on NUTCH-2319:
------------------------------

I am using parse-tika as HTML parser. Do I still upgrade to 1.12?

> Link with "rel=alternate" doesn't return in crawl 
> --------------------------------------------------
>
>                 Key: NUTCH-2319
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2319
>             Project: Nutch
>          Issue Type: Bug
>            Reporter: Zuber
>
> I am using nutch-1.4. I am getting the issue that the nutch doesn't return the URLs from
the link rel="alternate".
>  For example, I am trying to crawl the URL  http://rssfeeds.azcentral.com/phoenix/asu
which contains the  below link which I am not getting as result.
> <link rel="alternate" type="application/atom+xml" href="http://rssfeeds.azcentral.com/phoenix/asu&amp;x=1"
title="Phoenix - ASU">
> Could you please help



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message