lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: LookaheadTokenFilter
Date Fri, 06 Sep 2013 11:31:30 GMT
On Thu, Sep 5, 2013 at 8:44 PM, Benson Margulies <> wrote:
> I'm trying to work through the logic of reading ahead until I've seen
> marker for the end of a sentence, then applying some analysis to all of the
> tokens of the sentence, and then changing some attributes of each token to
> reflect the results.
> The queue of tokens for a position is just a State, so there isn't an API
> there to set any values.
> So do I need to subclass Position for myself, store the additional
> information in there, and set the attributes as each token comes by on the
> output side?

Yes, that sounds right.  Either that or, on emitting the eventual
Tokens, apply your logic there (because at that point, after
restoreState, you have access to all the attr values for that token).

> I would be grateful for a bit more explanation of afterPosition versus
> incrementToken; some of the mock classes call peek from afterPosition, and
> I expected to see peek called in incrementToken based on the javadoc.

afterPosition is where your subclass can "insert" new tokens.

I think (it's been a while here...) you are allowed to call peekToken
in afterPosition; this is necessary if your logic about inserting
additional tokens leaving a given position depends on future tokens.

But: are you doing any new token insertion?  Or are you just tweaking
the attributes of the tokens that pass through the filter?  If it's
the latter then this class may be overkill ... you could make a simple
TokenFilter.incrementToken that just enumerates & saves all input
tokens, does its processing, then returns those tokens one by one,

Mike McCandless

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message