nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Markus Jelsma (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-1771) Solrindex fails if a segment is corrupted or incomplete
Date Wed, 01 Apr 2015 20:03:54 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-1771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14391334#comment-14391334
] 

Markus Jelsma commented on NUTCH-1771:
--------------------------------------

Hi - i gave this some more thought. You should not even start indexing a corrupted segment
at all. If the fetcher fails for some reason, and the segment is not complete, it must be
deleted. Also, indexing must be performed after updating the DB, and since you cannot update
the DB with a corrupted segment, dealing with it in the indexer makes no sense.

You must delete corrupted segments if they got corrupted after the fetcher fails (note: segments
are not always corrupt if the fetcher fails due to other reasons). And you must always delete
segments if they cannot make it in the DB when updating.

> Solrindex fails if a segment is corrupted or incomplete
> -------------------------------------------------------
>
>                 Key: NUTCH-1771
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1771
>             Project: Nutch
>          Issue Type: Bug
>          Components: indexer
>    Affects Versions: 1.8, 1.10
>            Reporter: Diaa
>            Priority: Minor
>             Fix For: 1.11
>
>
> When using solrindex to index multiple segments via -dir segment,
> the indexing fails if one or more segments are corrupted/incomplete (generated but not
fetched for example)
> The failure is simply java.io exception.
> Deleting the segment fixes the issue.
> The expected behavior should be one of the following:
> * skipping the segment and proceeding with others (while logging)
> * stopping the indexing and logging the failed segment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message