lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <>
Subject RE: Language Specific Analyzer
Date Sat, 14 Nov 2015 17:10:34 GMT

you cannot change the behavior of predefined analyzers! But since Lucene 5 there is no need
to write your own subclass to define a custom analyzer. Just use CustomAnalyzer and define
via fluent builder API how your analysis should look like (see example in javadocs):

Please note: Language specific stemmers will fail to work correctly if the terms still contain
punctuation! It also depends on the stemmer if lowercasing is needed before the stemmer.


Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen

> -----Original Message-----
> From: marco turchi []
> Sent: Saturday, November 14, 2015 5:39 PM
> To:
> Subject: Language Specific Analyzer
> Dear Users,
> I need to develop my language specific analyzer that:
> 1) does not remove punctuations
> 2) lowercases and stems each term in the text.
> I have tried some of the pre-implemented language analyzer (e.g. German
> and
> Italian analyzers), but they remove punctuation.  I/m not sure, but
> probably what I need is the whitespace analyzer instead of the standard
> analyzer.
> Is there a way to force each language specific analyzer to use the
> whitespace analyzer or in general not to remove punctuations?
> Thanks a lot!
> Marco

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message