From general-return-1890-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Tue Dec 22 22:39:52 2009 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 45979 invoked from network); 22 Dec 2009 22:39:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 22 Dec 2009 22:39:49 -0000 Received: (qmail 83589 invoked by uid 500); 22 Dec 2009 22:39:26 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 83550 invoked by uid 500); 22 Dec 2009 22:39:23 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 83540 invoked by uid 99); 22 Dec 2009 22:39:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 22:39:22 +0000 X-ASF-Spam-Status: No, hits=-2.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simon.willnauer@googlemail.com designates 209.85.220.224 as permitted sender) Received: from [209.85.220.224] (HELO mail-fx0-f224.google.com) (209.85.220.224) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 22 Dec 2009 22:39:12 +0000 Received: by fxm24 with SMTP id 24so2409912fxm.11 for ; Tue, 22 Dec 2009 14:38:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=+XtocepZUlpYNumonF4ImX/wsO86Tz4ZIvC+5E8sHF8=; b=pbDpX3wPeftq21b54htQebq8HfRkcMyjvUSf1CyNC6W1Hzd8Z82ApatEUi6G6WeTLy FgfhUcQn0yEW6aBISUR9ODKNU/f7XbqtBKf1oDP+T5A7sCGiYp4t3u2bzYW3CgxaO285 6qdwHfnOtVbaZ77JD4LC+kufpvsdNB4iwR65w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; b=gFlVIK5PAkREOcTnvFC1mI7dUELzVCuhlunbiodRIZ6StGUaBozR80a29+2hybCdle D0KhOqvASCUMbk6JKtXQ+41Xdyxl/RzAzCAedMJ637Z1X+SoXB+d1gL2IXuwJtPBRePU mh9tHBqqe2JmI6G7xFxDHWYPVDsxIzY9G4PGk= MIME-Version: 1.0 Received: by 10.239.168.154 with SMTP id k26mr1041337hbe.104.1261521530806; Tue, 22 Dec 2009 14:38:50 -0800 (PST) Reply-To: simon.willnauer@gmail.com In-Reply-To: References: Date: Tue, 22 Dec 2009 23:38:50 +0100 Message-ID: Subject: Re: Combine WildcardQuerys From: Simon Willnauer To: general@lucene.apache.org Content-Type: text/plain; charset=UTF-8 Claudio, I might have confused you a bit with my last reply so I will try to elaborate a little bit. Internally Lucene uses a method called Query#rewrite() that most likely rewrites a query into a "primitive" query like BooleanQuery which consists of TermQuery instances. This is, in the most cases, completely hidden from the API user and only directly called to solve very specific and likely advanced problems. Yet, in order to be scored lucene 2.4 WildcardQuery rewrites to a BooleanQuery that again consists of TermQueries containing each term in the index that matches the wildcard pattern passed to it. If you have hundreds or thousands of terms matching a wildcard ("b*" is likely to match a lot of terms in the index) Lucene will internal raise an exception that too many clauses in the BooleanQuery. (see http://lucene.apache.org/java/2_4_0/api/core/org/apache/lucene/search/BooleanQuery.TooManyClauses.htmll for details). In Lucene 2.9 this can be handled differently with a specific rewrite method that does not rewrite to BooleanQuery but to for instance a ConstantScoreQuery(FilterQuery). To use this feature you need to upgrade to at least lucene 2.9. See http://lucene.apache.org/java/3_0_0/api/core/org/apache/lucene/search/MultiTermQuery.html and friends for details. Simon On Tue, Dec 22, 2009 at 2:22 PM, Simon Willnauer wrote: > Hi Claudio, > your query setup is fine for what you trying to do but as you are > using wildcards lucene internally rewrites your wildcardquery again > into another boolean query containing every term starting with your > prefix "b". if you use such a small prefix lucene will likely create > tons of boolean clauses one for each term starting with your prefix. > Yet, booleanQuery has a limitation (maxclausecount) that will trigger > the exception you are facing once you hit the clause count limit. You > can try to raise the limit in booleanquery but you will very likely > end up with a bad search performance. > Lucene 2.9 provides alternative rewritemethods for multitermqueries > like WildCardQuery that perform way better then the plain boolean > rewrite method. To achieve faster wildcardqueries you will end up with > a constant score instead of the normal lucene score a booleanquery > would create. So each hit will have the same constant score assigned > for your WildCardQuery. > > Look at org.apache.lucene.search.MultiTermQuery.RewriteMethod for details. > > hope that helps > > simon > > On 12/22/09, Claudio Deluca wrote: >> Hello, >> >> We currenty have implemeted a search for person by surname and forename with >> lucene 2.4.1. >> If both seach fields are filled, then we combine the WildcardQuerys in a >> BooleanQuery. >> * >> BooleanQuery theQuery = new BooleanQuery(); >> theQuery.add(new WildcardQuery(new Term("surname", "foo")), Occur.MUST); >> theQuery.add(new WildcardQuery(new Term("forename", "b*")), Occur.MUST);* >> *LuceneSearcherFactory theSearcherFactory = >> LuceneSearcherFactory.getInstance(); >> Searcher theSearcher = theSearcherFactory.getSearcher(); >> theRewritten = theSearcher.rewrite(theQuery); >> >> *In the database there is exactly one Person with surname "foo". When i >> comment the second term (forename) search works fine. >> If i run search including term "forename" "b*", the Searcher throws an >> TooManyClauses Exception white trying to rewrite the Query. >> While rewriting the searcher seems to find too many possibilities for >> forenames beginning with "b". >> >> How do i have to combine the terms so that lucene search works properly? >> >> Thanks, >> Claudio >> >