From lucene-dev-return-3809-qmlist-jakarta-archive-lucene-dev=nagoya.apache.org@jakarta.apache.org Thu Jul 17 16:53:30 2003 Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 53290 invoked from network); 17 Jul 2003 16:53:30 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 17 Jul 2003 16:53:30 -0000 Received: (qmail 23286 invoked by uid 97); 17 Jul 2003 16:56:03 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 23279 invoked from network); 17 Jul 2003 16:56:03 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 17 Jul 2003 16:56:03 -0000 Received: (qmail 51103 invoked by uid 500); 17 Jul 2003 16:53:04 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 51047 invoked from network); 17 Jul 2003 16:53:04 -0000 Received: from www4.mail.lycos.com (HELO lycos.com) (209.202.220.170) by daedalus.apache.org with SMTP; 17 Jul 2003 16:53:04 -0000 Received: from Unknown/Local ([?.?.?.?]) by mailcity.com; Thu, 17 Jul 2003 16:52:56 -0000 To: lucene-dev@jakarta.apache.org Date: Thu, 17 Jul 2003 09:52:56 -0700 From: "none none" Message-ID: Mime-Version: 1.0 X-Sent-Mail: off Reply-To: korfut@lycos.com X-Mailer: MailCity Service X-Priority: 3 Subject: Re: interesting phrase query issue X-Sender-Ip: 64.187.36.2 Organization: Lycos Mail (http://www.mail.lycos.com:80) Content-Type: text/plain; charset=us-ascii Content-Language: en Content-Transfer-Encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N i believe that looking for "access manager" should return no hits, if the document has "access, the manager" because the document is different, i know there is a stop word between, so my opinion is skip "the" and all the stop words at Search level rather then Index level,(google does that) but index them anyway. korfut -- --------- Original Message --------- DATE: Thu, 17 Jul 2003 07:53:06 From: Tatu Saloranta To: "Lucene Users List" Cc: >On Thursday 17 July 2003 07:20, greg wrote: >> I have several document sections that are being indexed via the >> StandardAnalyzer. One of these documents has the line "access, the >> manager". When searching for the phrase "access manager", this document is >> being returned. I understand why (at least i think i do), because a stop >> word is "the" and the "," is being removed by the tokenizer, my question is >> is there any way I can avoid having this returned in the results? My >> thoughts were to create a new analyzer that indexes the word "the" (blick >> to many of those), or index the "," in some way (also not good). Any >> suggestions? > >You can also replace all stop words with "dummy" token ("" might be an ok >candidate?). That would be similar to indexing "the" (which probably is >better idea than indexing ","). > >I'm planning to do something similar for paragraph breaks (in case of plain >text, double linefeed, for HTML

etc), to prevent similar problems. > >-+ Tatu +- > > > >--------------------------------------------------------------------- >To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org >For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > ____________________________________________________________ Get advanced SPAM filtering on Webmail or POP Mail ... Get Lycos Mail! http://login.mail.lycos.com/r/referral?aid=27005 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org