nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter Boot (JIRA)" <j...@apache.org>
Subject [jira] Commented: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic
Date Fri, 21 Dec 2007 21:17:43 GMT

    [ https://issues.apache.org/jira/browse/NUTCH-422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554034
] 

Peter Boot commented on NUTCH-422:
----------------------------------

I am getting errors when trying to compile  this plugin with the trunk.
Has anyone managed to update it ?
Is there a better way to get Nutch to create termVectors ?

[echo] Compiling plugin: index-extra
[javac] Compiling 3 source files to /opt/nutch-trunk/build/index-extra/classes
[javac] /opt/nutch-trunk/src/plugin/index-extra/src/java/org/apache/nutch/indexer/extra/ExtraIndexingFilter.java:61:
org.apache.nutch.indexer.extra.ExtraIndexingFilter is not abstract and
does not override abstract method
filter(org.apache.lucene.document.Document,org.apache.nutch.parse.Parse,org.apache.hadoop.io.Text,org.apache.nutch.crawl.CrawlDatum,org.apache.nutch.crawl.Inlinks)
in org.apache.nutch.indexer.IndexingFilter
[javac] public class ExtraIndexingFilter implements IndexingFilter {
[javac]   ^
[javac] Note: /opt/nutch-trunk/src/plugin/index-extra/src/java/org/apache/nutch/indexer/extra/ExtraIndexingFilter.java
uses or overrides a deprecated API.

> index-extra plugin creates additional fields in the index, based on configurable logic
> --------------------------------------------------------------------------------------
>
>                 Key: NUTCH-422
>                 URL: https://issues.apache.org/jira/browse/NUTCH-422
>             Project: Nutch
>          Issue Type: New Feature
>          Components: indexer
>    Affects Versions: 0.8.1
>         Environment: All environments
>            Reporter: Alan Tanaman
>            Assignee: Sami Siren
>         Attachments: index-extra-v1.0-bin-java1.5.zip, index-extra-v1.0-source.zip
>
>
> Extract from the Readme file:
> A.  Introduction
>     The index-extra plugin allows you to configure additional fields that you wish to
be added to the index, based on one of the following sources:
>       - The parsed text
>       - Meta data fields
>       - Previously created document-to-be-indexed fields
>       - Plain constant string
>       - Java expression combining one or more of the above, and resolving to a string
>     A regex can also be applied to any of the above, allowing fields to be created based
on patterns extracted from the source.
> B.  Installation
>     1)  Binaries only:  Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip
to NUTCHDIR/build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
>     2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building
Nutch.  In short:
>                         Copy the 'index-extra' folder within index-extra-v1.0-source.zip
to NUTCHDIR/src/plugin
>                         Update the build.xml in NUTCHDIR/src/plugin to include plugin
>                         Update the NUTCHDIR/default.properties file to include plugin
>                         run ant to build
>                         Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
>                         Enable the plugin by updating the nutch-site.xml file
> C.  Known Issues
>     1)  For this plugin to work correctly on any document field, it is necessary to run
the other index filters
>     first, so that all basic document fields are generated first.  To do this, configure
the indexingfilter.order
>     property.  (Please see patch NUTCH-421 to enable indexingfilter.order property. If
this patch is not applied,
>     the plugin will still work, but will not be able to use document fields created by
other index filter plugins.)
>     2)  At this stage, field boost can not be used as Nutch scoring overrides the field
boost with its own
>     document-level boost calculation.  This occurs at the end of org.apache.nutch.indexer.Indexer's
reduce method.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message