nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jorge Luis Betancourt Gonzalez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.
Date Mon, 28 Aug 2017 20:08:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144264#comment-16144264
] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-2414:
-------------------------------------------------------

+1 This would allow also help to deprecate the {{mimetype-filter}} plugin and avoid having
the responsibility of indexing/allowing/blocking documents (from being indexed) scattered
across several plugins

> Allow LanguageIndexingFilter to actually filter documents by language.
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-2414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2414
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>    Affects Versions: 1.13
>            Reporter: Yossi Tamari
>            Priority: Minor
>
> It is often useful to only index pages in select languages (e.g. only those languages
that we intend to search in). At first glance it seems that this is done by LanguageIndexingFilter,
but currently all the filter does is add the language as a field to the index.
> We can add a configuration property to LanguageIndexingFilter that will allow it to only
index languages specified in this property.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message