lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-1077) Analysis Sinks package
Date Tue, 04 Dec 2007 16:17:43 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Grant Ingersoll updated LUCENE-1077:
------------------------------------

    Attachment: LUCENE-1077.patch

This is a fairly trivial start to to this, but it creates the sinks package in the contrib/Analysis
section and adds a simple TokenRangeSinkTokenizer and test.  This can be used to siphon off
tokens that fall in a range.  All it does is count the tokens that go by and add those that
fall in the range.  It might be useful for documents that you know have certain structures.
 For instance, if you know the first 5 tokens of your docs are X.

More to follow.

> Analysis Sinks package
> ----------------------
>
>                 Key: LUCENE-1077
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1077
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Analysis, contrib/*
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.3
>
>         Attachments: LUCENE-1077.patch
>
>
> With the advent of the new TeeTokenFilter and SinkTokenizer, there now exists some interesting
new things that can be done in the analysis phase of indexing.  See LUCENE-1058.
> This patch provides some new implementations of SinkTokenizer that may be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message