nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (NUTCH-2414) Allow LanguageIndexingFilter to actually filter documents by language.
Date Wed, 13 Dec 2017 20:54:00 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16289889#comment-16289889
] 

ASF GitHub Bot commented on NUTCH-2414:
---------------------------------------

lewismc commented on issue #217: NUTCH-2414 - Allow LanguageIndexingFilter to actually filter
documents by language
URL: https://github.com/apache/nutch/pull/217#issuecomment-351518729
 
 
   [Markus' comments](https://issues.apache.org/jira/browse/NUTCH-2414?focusedCommentId=16144237&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16144237)
over at the JIRA issue are certainly the better solution. This patch does however address
the original item description and based on that I feel we should merge into master branch.
   We can revisit once a Jexl IndexingFilter is developed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Allow LanguageIndexingFilter to actually filter documents by language.
> ----------------------------------------------------------------------
>
>                 Key: NUTCH-2414
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2414
>             Project: Nutch
>          Issue Type: Improvement
>          Components: plugin
>    Affects Versions: 1.13
>            Reporter: Yossi Tamari
>            Priority: Minor
>
> It is often useful to only index pages in select languages (e.g. only those languages
that we intend to search in). At first glance it seems that this is done by LanguageIndexingFilter,
but currently all the filter does is add the language as a field to the index.
> We can add a configuration property to LanguageIndexingFilter that will allow it to only
index languages specified in this property.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message