[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-901:
--------------------------------
Attachment: NUTCH-901-MarkusJelsma.998958.patch
Here's a patch for version 1.2. It includes a backward compatible setting in nutch-default.xml
and handles the setting the the MoreIndexingFilter.java. It's tested and behaves as expected
on my 1.2 up to date check out.
> Make index-more plug-in configurable
> ------------------------------------
>
> Key: NUTCH-901
> URL: https://issues.apache.org/jira/browse/NUTCH-901
> Project: Nutch
> Issue Type: Improvement
> Components: indexer
> Affects Versions: 1.2, 2.0
> Reporter: Markus Jelsma
> Fix For: 2.0
>
> Attachments: NUTCH-901-MarkusJelsma.998958.patch
>
>
> In my case, i don't want the index-more plug-in to split content-types on slash. Tokenization
is something a Solr instance should take care of. Instead of removing the code (which would
break compatibility for users that rely on it), we need a way to configure the plug-in not
to split the content-type.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|