i believe that looking for "access manager" should return no hits, if the document has "access, the manager" because the document is different, i know there is a stop word between, so my opinion is skip "the" and all the stop words at Search level rather then Index level,(google does that) but index them anyway. korfut -- --------- Original Message --------- DATE: Thu, 17 Jul 2003 07:53:06 From: Tatu Saloranta To: "Lucene Users List" Cc: >On Thursday 17 July 2003 07:20, greg wrote: >> I have several document sections that are being indexed via the >> StandardAnalyzer. One of these documents has the line "access, the >> manager". When searching for the phrase "access manager", this document is >> being returned. I understand why (at least i think i do), because a stop >> word is "the" and the "," is being removed by the tokenizer, my question is >> is there any way I can avoid having this returned in the results? My >> thoughts were to create a new analyzer that indexes the word "the" (blick >> to many of those), or index the "," in some way (also not good). Any >> suggestions? > >You can also replace all stop words with "dummy" token ("" might be an ok >candidate?). That would be similar to indexing "the" (which probably is >better idea than indexing ","). > >I'm planning to do something similar for paragraph breaks (in case of plain >text, double linefeed, for HTML

etc), to prevent similar problems. > >-+ Tatu +- > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > ____________________________________________________________ Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org