nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dennis Kubes (JIRA)" <j...@apache.org>
Subject [jira] Updated: (NUTCH-613) Empty Summaries and Cached Pages
Date Tue, 19 Feb 2008 06:32:43 GMT

     [ https://issues.apache.org/jira/browse/NUTCH-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dennis Kubes updated NUTCH-613:
-------------------------------

    Attachment: NUTCH-613-1-20080219.patch

This patch checks the hit details for an orig field and uses that as the url field if it exists.
 This allows the system to correctly find the summary and cached contents.  I don't know if
this solves the entire problem of redirects and how they are stored but it does solve the
symptom of summaries not showing up and cached pages erroring.

> Empty Summaries and Cached Pages
> --------------------------------
>
>                 Key: NUTCH-613
>                 URL: https://issues.apache.org/jira/browse/NUTCH-613
>             Project: Nutch
>          Issue Type: Bug
>          Components: fetcher, searcher, web gui
>    Affects Versions: 0.9.0
>         Environment: All
>            Reporter: Dennis Kubes
>            Assignee: Dennis Kubes
>             Fix For: 0.9.0, 1.0.0
>
>         Attachments: NUTCH-613-1-20080219.patch
>
>
> There is a bug where some search results do not have summaries and viewing their cached
pages causes a NullPointer.  This bug is due to redirects getting stored under the new url
and the getURL method of FetchedSegments getting the wrong (old) url which is stored in crawldb
but has no content or parse objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message