nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sami Siren (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-583) FeedParser empty links for items
Date Wed, 18 Feb 2009 13:47:04 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sami Siren updated NUTCH-583:
-----------------------------

    Fix Version/s:     (was: 1.0.0)
                   1.1

pushing this to 1.1

> FeedParser empty links for items
> --------------------------------
>
>                 Key: NUTCH-583
>                 URL: https://issues.apache.org/jira/browse/NUTCH-583
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 1.1
>
>
> FeedParser in feed plugin just discards the item if it does not have <link> element.
However Rss 2.0 does not necessitate the <link> element for each <item>. 
> Moreover sometimes the link is given in the <guid> element which is a globally
unique identifier for the item. I think we can search the url for an item first, then if it
is still not found, we can use the feed's url, but with merging all the parse texts into one
Parse object. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message