lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sirish Vadala <>
Subject RE: Phrase Query Problem
Date Tue, 18 Dec 2007 19:44:50 GMT

Hmmm... I had come up with a temporary solution for the time being. This is
how I am initializing the StandardAnalyzer to fix my problem.

String[] STOP_WORDS = {};
this.analyzer = new StandardAnalyzer(STOP_WORDS);

This now indexes all my stop words, and gladly it didn't increase my
indexing time remarkably, but only a small difference. Not sure if this is
the right solution. Will also do some research on custom analyzers.


1) Whenever we change to a different analyzer, we need to reindex
   whole dataset, whether or not using WhiteSpaceAnalyzer.
2) Using WhiteSpaceAnalyzer may increase disk space and slow-down
   indexing because more tokens are indexed, how much can be slowed
   I donot know.
3) WhiteSpaceAnalyzer also keeps case, for example, if input text
   has "Health", query "health" may not return the doc, make sure
   if this is you need, also this analyzer will keep all symbols,
   like coma, period .... For example, if text has "Number ONE issue
   is health safety!", query "health safety" will not return the doc,
   because "safety!" is indexed as a token, not "safety".

I felt most important thing is to make sure the exact query requirement,
then picking up analyzer.

Best regards, Lisheng

View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message