lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <>
Subject Re: Anyway to not bother scoring less good matches ?
Date Thu, 05 May 2011 10:59:49 GMT
for an excellent article and solution to the problem with common

You could also consider using, and caching and reusing, filters for
the tnum and tracks fields.


On Thu, May 5, 2011 at 11:31 AM, Paul Taylor <> wrote:
> On 05/05/2011 11:13, Ahmet Arslan wrote:
>>> Yes correct, but I have looked and the list of
>>> optimizations before. What was clear from profiling was that
>>> it wasnt the searching part that was slow (a query run on
>>> the same index with only a few matching docs ran super fast)
>>> the slowness only occurs when there are loads of matching
>>> docs, and spends most of its time in scorer that is why I
>>> was trying to remove the poor matches.
>> Okey all clear. Can you give us some example query strings where there are
>> loads of matching?
>> Do you use stop word filter? Could it be case described as
>> "As you approach the upper limits of a single machine,
>> extremely frequent terms (called stop words) can become very
>> expensive in the wrong query. If part of a top level BooleanQuery, a
>> SHOULD clause that appears in every document will cause a match and
>> score for every document in your index."
> We used to use the default stop word list but have no stop words because
> Lucene is used to match very short fields  relating to Musicdata such as
> artist or album name, therefore the default stop words really need to be
> included to get good matches, for example how would you match the artist
> 'The The' otherwise, so use of a stop word word list is not an option.
> If people construct good queries thee is no problem, but the trouble is that
> many users just OR everything they are looking for because they don't want a
> good match rejected because just one term fails, but the problem is there
> are a number of very popular terms, for example the following query:
> tnum:(6) qdur:(189) artist:(tama) track:(ibata) tracks:(10) release:(the
> global rhythm september 2002)
> will match any song that is on an album with 10 tracks, any song which is
> trackno 6 on an album, and any release containg the word 'the' , when really
> what they are looking for is the song 'ibata' by artist 'tama',
> This matches over a million documents (songs) , but doesn't match any well,
> because the song 'ibata' by 'tama' isnt actually in the index !
> So I dont think the query is very good but I cannot force users to submit
> better queries, but I want to protect the server by reducing the time these
> kind of query take (upto 1 second as opposed to the more usual 100
> milliseconds) and I hope that forcing x number of terms to match would do
> that.
> Paul
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message