Hi,
you cannot change the behavior of predefined analyzers! But since Lucene 5 there is no need
to write your own subclass to define a custom analyzer. Just use CustomAnalyzer and define
via fluent builder API how your analysis should look like (see example in javadocs):
https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html
Please note: Language specific stemmers will fail to work correctly if the terms still contain
punctuation! It also depends on the stemmer if lowercasing is needed before the stemmer.
Uwe
-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de
> -----Original Message-----
> From: marco turchi [mailto:marco.turchi@gmail.com]
> Sent: Saturday, November 14, 2015 5:39 PM
> To: java-user@lucene.apache.org
> Subject: Language Specific Analyzer
>
> Dear Users,
> I need to develop my language specific analyzer that:
> 1) does not remove punctuations
> 2) lowercases and stems each term in the text.
>
> I have tried some of the pre-implemented language analyzer (e.g. German
> and
> Italian analyzers), but they remove punctuation. I/m not sure, but
> probably what I need is the whitespace analyzer instead of the standard
> analyzer.
>
> Is there a way to force each language specific analyzer to use the
> whitespace analyzer or in general not to remove punctuations?
>
> Thanks a lot!
> Marco
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
|