lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Martin O'Shea" <m.os...@dsl.pipex.com>
Subject How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2
Date Mon, 10 Nov 2014 13:54:14 GMT
I realise that 3.0.2 is an old version of Lucene but if I have Java code as
follows:

 

int nGramLength = 3;

Set<String> stopWords = new Set<String>();

stopwords.add("the");

stopwords.add("and");

...

SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30,
"English", stopWords);                  

ShingleAnalyzerWrapper shingleAnalyzer = new
ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);

 

Which will generate the frequency of ngrams from a particular a string of
text without stop words, how can I disable the LowerCaseFilter which forms
part of the SnowBallAnalyzer? I want to preserve the case of the ngrams
generated so that I can perform various counts according to the presence /
absence of upper case characters in the ngrams.

 

I am something of a Lucene newbie. And I should add that upgrading the
version of Lucene is not an option here.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message