lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "none none" <>
Subject Re: interesting phrase query issue
Date Thu, 17 Jul 2003 16:52:56 GMT
i believe that looking for "access manager" should return no hits, if the document has "access,
the manager" because the document is different, i know there is a stop word between, so my
opinion is skip "the" and all the stop words at Search level rather then Index level,(google
does that) but index them anyway.



--------- Original Message ---------

DATE: Thu, 17 Jul 2003 07:53:06
From: Tatu Saloranta <>
To: "Lucene Users List" <>

>On Thursday 17 July 2003 07:20, greg wrote:
>> I have several document sections that are being indexed via the
>> StandardAnalyzer.  One of these documents has the line "access, the
>> manager".  When searching for the phrase "access manager", this document is
>> being returned.  I understand why (at least i think i do), because a stop
>> word is "the" and the "," is being removed by the tokenizer, my question is
>> is there any way I can avoid having this returned in the results?  My
>> thoughts were to create a new analyzer that indexes the word "the" (blick
>> to many of those), or index the "," in some way (also not good).  Any
>> suggestions?
>You can also replace all stop words with "dummy" token ("" might be an ok 
>candidate?). That would be similar to indexing "the" (which probably is  
>better idea than indexing ",").
>I'm planning to do something similar for paragraph breaks (in case of plain 
>text, double linefeed, for HTML <p> etc), to prevent similar problems.
>-+ Tatu +-
>To unsubscribe, e-mail:
>For additional commands, e-mail:

Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail!

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message