nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From BlackIce <blackice...@gmail.com>
Subject Re: [jira] [Commented] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.
Date Mon, 28 Aug 2017 20:52:23 GMT
+1 This way one could have a very focused crawl/search

On Mon, Aug 28, 2017 at 10:08 PM, Jorge Luis Betancourt Gonzalez (JIRA) <
jira@apache.org> wrote:

>
>     [ https://issues.apache.org/jira/browse/NUTCH-2414?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel&focusedCommentId=16144264#comment-16144264 ]
>
> Jorge Luis Betancourt Gonzalez commented on NUTCH-2414:
> -------------------------------------------------------
>
> +1 This would allow also help to deprecate the {{mimetype-filter}} plugin
> and avoid having the responsibility of indexing/allowing/blocking documents
> (from being indexed) scattered across several plugins
>
> > Allow LanguageIndexingFilter to actually filter documents by language.
> > ----------------------------------------------------------------------
> >
> >                 Key: NUTCH-2414
> >                 URL: https://issues.apache.org/jira/browse/NUTCH-2414
> >             Project: Nutch
> >          Issue Type: Improvement
> >          Components: plugin
> >    Affects Versions: 1.13
> >            Reporter: Yossi Tamari
> >            Priority: Minor
> >
> > It is often useful to only index pages in select languages (e.g. only
> those languages that we intend to search in). At first glance it seems that
> this is done by LanguageIndexingFilter, but currently all the filter does
> is add the language as a field to the index.
> > We can add a configuration property to LanguageIndexingFilter that will
> allow it to only index languages specified in this property.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.4.14#64029)
>

Mime
View raw message