lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rodrigo Reyes" <>
Subject Re: Searcher/Reader/Writer Management
Date Wed, 03 Apr 2002 19:34:10 GMT

I'd like to respond on this point:

> 5. Can someone imagine situation when more than one Analyzers are used in
an application?

 Not only can I imagine such a situation, but I'd also strongly recommand it
for any high-quality application! If you are just targetting speed and light
cpu usage, sure, one single analyzer is enough. But your application will
get the precision/recall it deserves. A nice search engine should be
flexible enough to use several analyzers, and combine their result to
retrieve the best possible recall/precision. For example, say you are
looking for something related to "selling toothbrushes". The application
should retrieve all the occurrences matching exactly "selling toothbrushes"
(using a strict analyzer), but it may also retrieve "sell toothbrush" (using
a stemming normalizer). Why not retrieving "buy toothbrush" or "sell dental
tools" as well (kind of semantic normalizer/analyzer). One could also
imagine retrieving "Selin Toothbrushies" (phonetic normalizer).

 Ok, so this increases the precision, but unfortunately increases
drastically the recall, right ? wrong : all this analyzers should be
ordered, and the final result should be a calculation using the results of
all those indexes. For instance, the results of the strict-analyzer-index
should be heavier than stemming, which should be heavier than phonetic, etc.
The very simple reason is that the more aggressive is the normalization
process, the less likely /hazardous is it to be exactly what the user is
looking for. Sure, it's CPU intensive, but here is the dilemma of the search
engines : be fast or be smart. My belief is that lucene, as a search engine,
should allow both kind of application (and I personnaly prefer smart SE,
rather than fast ones).


To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message