lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <>
Subject Re: English and French documents together / analysis, indexing, searching
Date Thu, 20 Jan 2005 18:28:22 GMT

> Right now I am using StandardAnalyzer but the results are not what I'd 
> hope for. Also since my understanding is that we should use the same 
> analyzer for searching that was used for indexing,
> even if I can manage to guess the language during indexing and apply 
> to the SnowBall analyzer I wouldn't be able to use SnowBall for 
> searching because users want to search through both
> English and French and I suppose I would not get the same results if 
> used with StandardAnalyzer?

you could try to create a more complex query and expand it into both 
languages using different analyzers. Would this solve your problem ?

> Another problem with StandardAnalyzer is that it breaks up some words 
> that should not be broken (in our case document identifiers such as 
> ABC-1234 etc) but that's a secondary issue...

This is a behaviour is implemented in StandardTokenizer used by 
StandardAnalyzer. Look at the documentation of StandardTokenizer:

Many applications have specific tokenizer needs.  If this tokenizer does
not suit your application, please consider copying this source code
directory to your project and maintaining your own grammar-based tokenizer.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message