lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: TokenStream and Token APIs
Date Tue, 21 Oct 2008 10:18:57 GMT

On Oct 21, 2008, at 1:39 AM, Michael Busch wrote:
>> Perhaps it would be useful for Lucene to offer exactly one subclass  
>> of Token that we guarantee will always have all known Attributes  
>> (i.e. the ones Lucene provides)  available to it for casting  
>> purposes.
> Yeah we could do that. In fact, I did exactly this when I started  
> working on this patch. I created a class called PlainToken, which  
> had all the termBuffer and attributes logic, and changed Token to  
> extend it. Then the new getToken() method would return an instance  
> of PlainToken. My main concern with this approach is that it will  
> make the code in the indexer more complicated, because it always has  
> to check if we have a Token or PlainToken; if it's a Token then it  
> has to use the get*() method directly, for a PlainToken it has  
> tocheck for the *Attributes. So that's a bit messy (it's in fact  
> exactly like that in the current patch for backwards-compatibility,  
> but we could clean this up in 3.0). So for code simplicity I'm  
> slightly in favor of not creating the a class that implements a  
> default set of functionality without Attributes.

Yes that would be messy, but not exactly what I was proposing.  I was  
originally thinking we needed a derived class, but now it seems like  
we should just keep convenience methods on Token itself.

That is, why not just have Token implement both the attribute methods  
and dummy wrappers for the guaranteed to exist Attributes that Lucene  


public int startOffset(){
	return getAttribute(OffsetAttribute.class).endOffset();

This makes back-compat a snap, moreover it causes less pain for  
people, b/c Analyzer/Token stuff is more than likely the one of the  
most customized pieces of Lucene.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message