nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris A. Mattmann (JIRA)" <>
Subject [jira] Resolved: (NUTCH-443) allow parsers to return multiple Parse object, this will speed up the rss parser
Date Sun, 17 Jun 2007 17:21:26 GMT


Chris A. Mattmann resolved NUTCH-443.

    Resolution: Fixed

Patch tested and contributed by Dogacan. This update is a fix and semantics change from the
original patch for NUTCH-443. The original patch did not tell the  Indexer to read crawl_parse
too so that it can pickup sub-urls' fetch datums. This patch addresses that issue. Now, if
Fetcher gets a null content, instead of pushing an empty content, it filters the null content.

> allow parsers to return multiple Parse object, this will speed up the rss parser
> --------------------------------------------------------------------------------
>                 Key: NUTCH-443
>                 URL:
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.9.0
>            Reporter: Renaud Richardet
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.0.0
>         Attachments: NUTCH-443-draft-v1.patch, NUTCH-443-draft-v2.patch, NUTCH-443-draft-v3.patch,
NUTCH-443-draft-v4.patch, NUTCH-443-draft-v5.patch, NUTCH-443-draft-v6.patch, NUTCH-443-draft-v7.patch,
NUTCH-443.022507.patch.txt, NUTCH-443.02282007-v2.patch, NUTCH-443.02282007.patch, NUTCH-443.08052007.patch,
NUTCH_443_reopened_v3.patch, parse-map-core-draft-v1.patch, parse-map-core-untested.patch,
parsers.diff, patch.txt, redirect_and_index.patch, redirect_and_index_v2.patch
> allow Parser#parse to return a Map<String,Parse>. This way, the RSS parser can
return multiple parse objects, that will all be indexed separately. Advantage: no need to
fetch all feed-items separately.
> see the discussion at

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message