nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sami Siren (JIRA)" <>
Subject [jira] Resolved: (NUTCH-421) Allow predeterminate running order of index filters
Date Sat, 06 Jan 2007 20:01:27 GMT


Sami Siren resolved NUTCH-421.

       Resolution: Fixed
    Fix Version/s: 0.9.0

Thanks Alan,

I just committed this with additionali junit test and a fix similar to NUTCH-325.

Identation in IndexingFilters is still screwed, I'll fix that on next pass.

Next step regarding filters could be combining the common features from INdexingFilters, URLFilters
and friends to a common super class.

> Allow predeterminate running order of index filters
> ---------------------------------------------------
>                 Key: NUTCH-421
>                 URL:
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All
>            Reporter: Alan Tanaman
>         Assigned To: Sami Siren
>            Priority: Minor
>             Fix For: 0.9.0
>         Attachments: nutch-421.patch
> I've tested a patch for org.apache.nutch.indexer.IndexingFilters, allowing the user to
state in which order the indexing filters are to be run based on a new
> indexingfilter.order property. This is needed when a filter needs to rely on previously
generated document fields as a source of input to generate further fields.
> As suggested elsewhere, I based this on the urlfilter.order functionality:
> <property>
>   <name>indexingfilter.order</name>
>   <value>org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter</value>
>   <description>The order by which index filters are applied.
>   If empty, all available index filters (as dictated by properties
>   plugin-includes and plugin-excludes above) are loaded and applied in system
>   defined order. If not empty, only named filters are loaded and applied
>   in given order. For example, if this property has value:
>   org.apache.nutch.indexer.basic.BasicIndexingFilter org.apache.nutch.indexer.more.MoreIndexingFilter
>   then BasicIndexingFilter is applied first, and MoreIndexingFilter second.
>   Since all filters are AND'ed, filter ordering does not have impact
>   on end result, but it may have performance implication, depending
>   on relative expensiveness of filters.
>   </description>
> </property>

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


View raw message