nutch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Tanaman (JIRA)" <j...@apache.org>
Subject [jira] Created: (NUTCH-422) index-extra plugin creates additional fields in the index, based on configurable logic
Date Thu, 28 Dec 2006 19:23:22 GMT
index-extra plugin creates additional fields in the index, based on configurable logic
--------------------------------------------------------------------------------------

                 Key: NUTCH-422
                 URL: http://issues.apache.org/jira/browse/NUTCH-422
             Project: Nutch
          Issue Type: New Feature
          Components: indexer
    Affects Versions: 0.8.1
         Environment: All environments
            Reporter: Alan Tanaman


Extract from the Readme file:

A.  Introduction

    The index-extra plugin allows you to configure additional fields that you wish to be added
to the index, based on one of the following sources:
      - The parsed text
      - Meta data fields
      - Previously created document-to-be-indexed fields
      - Plain constant string
      - Java expression combining one or more of the above, and resolving to a string
    A regex can also be applied to any of the above, allowing fields to be created based on
patterns extracted from the source.

B.  Installation

    1)  Binaries only:  Copy the 'index-extra' folder within index-extra-v1.0-bin-java1.5.zip
to NUTCHDIR/build
                        Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
                        Enable the plugin by updating the nutch-site.xml file
    2)  Source code:    Always refer to the Nutch wiki for detailed instructions on building
Nutch.  In short:
                        Copy the 'index-extra' folder within index-extra-v1.0-source.zip to
NUTCHDIR/src/plugin
                        Update the build.xml in NUTCHDIR/src/plugin to include plugin
                        Update the NUTCHDIR/default.properties file to include plugin
                        run ant to build
                        Copy the 'index-extra-conf.xml' file to NUTCHDIR/conf, and configure
                        Enable the plugin by updating the nutch-site.xml file

C.  Known Issues

    1)  For this plugin to work correctly on any document field, it is necessary to run the
other index filters
    first, so that all basic document fields are generated first.  To do this, configure the
indexingfilter.order
    property.  (Please see patch NUTCH-421 to enable indexingfilter.order property. If this
patch is not applied,
    the plugin will still work, but will not be able to use document fields created by other
index filter plugins.)

    2)  At this stage, field boost can not be used as Nutch scoring overrides the field boost
with its own
    document-level boost calculation.  This occurs at the end of org.apache.nutch.indexer.Indexer's
reduce method.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message