lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <ysee...@gmail.com>
Subject Re: TokenFilters eating position increments
Date Thu, 22 Sep 2005 20:51:48 GMT
> Thoughts?

LOL! You're psychic.
http://issues.apache.org/jira/browse/LUCENE-438

-Yonik
Now hiring -- http://tinyurl.com/7m67g

On 9/22/05, Erik Hatcher <erik@ehatchersolutions.com> wrote:
>
> Yonik identified an interesting issue with LUCENE-437 - http://
> issues.apache.org/jira/browse/LUCENE-437<http://issues.apache.org/jira/browse/LUCENE-437>
>
> I patched the SnowballFilter, but then looked at other filters and we
> have the same issue with some of them (like StandardFilter,
> GermanStemFilter, GreekLowerCaseFilter, and others that create a new
> Token).
>
> To perhaps alleviate this situation in the future, maybe we should
> add another constructor to Token:
>
> public Token(String text, int start, int end, String typ, int
> positionIncrement)
>
> Or maybe one that clones an existing token:
>
> public Token(Token template, String newText)
>
> where all the metadata for the token (start, end, type, and position
> increment) is copied and the newText is used for the Token text
> instead. Filters don't generally change offsets, type, or position
> increments anyway - the majority change the text for stemming or
> lowercasing purposes.
>
> Thoughts?
>
> Erik
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message