lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: Dealing with special cases in analyser
Date Thu, 18 Mar 2010 06:50:09 GMT
Grant Ingersoll wrote:
> On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:
>> Grant Ingersoll wrote:
>>> What's your current chain of TokenFilters?  How many exceptions do you expect?
 That is, could you enumerate them?
>> Very few, yes I could enumerate them, but not sure what exactly you are suggesting,
what I was going to do would be add to the charConvertMap (when I posted I thought this was
only for individual chars not strings)
> You could have modify whichever filter is removing them to take in a protected words
list and then short circuit to not remove that token.  This would be a hash map lookup, which
should be faster than the char replacement you are considering. Many of the stemmers do this.
Hmm, they are removed by the tokenizer not a filter because they are 
punctuation chars, I suppose I could try and modify the jflex file

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message