lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stephane vaucher <vauc...@LUB.UMontreal.CA>
Subject Re: Should Token be immutable?
Date Mon, 06 Jan 2003 18:00:06 GMT
I don't mind spending 5 minutes (which I've just done) to implement the 
changes required to make Token immutable with (what I believe are) the 
appropriate changes to PorterStemFilter, LowerCaseFilter, StopFilter. 
I've got a few questions though:

1) Does anyone mind? Will it break anything?
2) Are there units tests for this? (particularly PorterStemFilter). The 
changes are obviously not spectacular, but I prefer not to screw 
everyone up...
3) I've checked-out the latest version of lucene, is there anything 
special I need to do if I get the go ahead to check my stuff in (like a 
dev list review)?


Brian Goetz wrote:

>> I've read rapidly through the analyser's code, but I'm in no way a 
>> lucene master. If I understood your statement correctly, you are 
>> saying that we would multiply the number of tokens by 1.5 per 
>> tokeniser it uses. A potential "optimisation" would be that sometimes 
>> the string could be reused since it's immutable as well.
> Actually, I was saying that's the absolute worst case.  It wouldn't 
> surprise me to see that the actual effect is that it results in only a 
> 10 or 15% increase in object creation during tokenization, not only 
> for the reason you state, but also because there might well be other 
> object creations on a per-token basis that we're not seeing.
>> Personally, I believe it would be cleaner to make it immutable (I 
>> think that's why this thread started), so +1.
> Yup.
> Immutability -- good.  Mutability just to save a few cycles -- bad.
> -- 
> Brian Goetz
> Quiotix Corporation
>           Tel: 650-843-1300            Fax: 
> 650-324-8032
> -- 
> To unsubscribe, e-mail:   
> <>
> For additional commands, e-mail: 
> <>

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message