nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jorge Luis Betancourt Gonzalez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.
Date Mon, 28 Aug 2017 21:12:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16144357#comment-16144357
] 

Jorge Luis Betancourt Gonzalez commented on NUTCH-2414:
-------------------------------------------------------

[~yossi] I think that [~markus.jelsma@openindex.io] is suggesting implementing a generic {{IndexingFilter}}
that supports JEXL expressions, this way we don't need to modify every possible {{IndexingFilter}},
this will be easier to maintain in the long run and provides a better separation.

> Allow LanguageIndexingFilter to actually filter documents by language.
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-2414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2414
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>    Affects Versions: 1.13
>            Reporter: Yossi Tamari
>            Priority: Minor
>
> It is often useful to only index pages in select languages (e.g. only those languages
that we intend to search in). At first glance it seems that this is done by LanguageIndexingFilter,
but currently all the filter does is add the language as a field to the index.
> We can add a configuration property to LanguageIndexingFilter that will allow it to only
index languages specified in this property.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message