lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <>
Subject Re: Phrase Query Problem
Date Tue, 18 Dec 2007 21:06:52 GMT
This will, indeed, NOT remove stop words. If that is all you need, you're

But you will now have useless words in your index like the, is, etc. Making
your own analyzer by subclassing a suitable existing analyzer, or composing
will fix you right up if having the extra words in your index turns out not
be OK.

And it shouldn't change your indexing speed noticeably.


On Dec 18, 2007 2:44 PM, Sirish Vadala <> wrote:

> Hmmm... I had come up with a temporary solution for the time being. This
> is
> how I am initializing the StandardAnalyzer to fix my problem.
> String[] STOP_WORDS = {};
> this.analyzer = new StandardAnalyzer(STOP_WORDS);
> This now indexes all my stop words, and gladly it didn't increase my
> indexing time remarkably, but only a small difference. Not sure if this is
> the right solution. Will also do some research on custom analyzers.
> Hi,
> 1) Whenever we change to a different analyzer, we need to reindex
>   whole dataset, whether or not using WhiteSpaceAnalyzer.
> 2) Using WhiteSpaceAnalyzer may increase disk space and slow-down
>   indexing because more tokens are indexed, how much can be slowed
>   I donot know.
> 3) WhiteSpaceAnalyzer also keeps case, for example, if input text
>   has "Health", query "health" may not return the doc, make sure
>   if this is you need, also this analyzer will keep all symbols,
>   like coma, period .... For example, if text has "Number ONE issue
>   is health safety!", query "health safety" will not return the doc,
>   because "safety!" is indexed as a token, not "safety".
> I felt most important thing is to make sure the exact query requirement,
> then picking up analyzer.
> Best regards, Lisheng
> --
> View this message in context:
> Sent from the Lucene - Java Users mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message