From java-user-return-41779-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Aug 07 13:04:49 2009 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 79026 invoked from network); 7 Aug 2009 13:04:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Aug 2009 13:04:49 -0000 Received: (qmail 12185 invoked by uid 500); 7 Aug 2009 13:04:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 12113 invoked by uid 500); 7 Aug 2009 13:04:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 12103 invoked by uid 99); 7 Aug 2009 13:04:50 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Aug 2009 13:04:50 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [68.230.240.9] (HELO eastrmmtao103.cox.net) (68.230.240.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Aug 2009 13:04:39 +0000 Received: from eastrmimpo03.cox.net ([68.1.16.126]) by eastrmmtao103.cox.net (InterMail vM.7.08.02.01 201-2186-121-102-20070209) with ESMTP id <20090807130418.EDJK27176.eastrmmtao103.cox.net@eastrmimpo03.cox.net>; Fri, 7 Aug 2009 09:04:18 -0400 Received: from eastrmwml31 ([172.18.18.217]) by eastrmimpo03.cox.net with bizsmtp id RR4J1c0024h0NJL02R4JPk; Fri, 07 Aug 2009 09:04:18 -0400 X-VR-Score: -200.00 X-Authority-Analysis: v=1.0 c=1 a=wTB1bo__o_gA:10 a=pGLkceISAAAA:8 a=kviXuzpPAAAA:8 a=mV9VRH-2AAAA:8 a=4LA2vbobW45slbrDcEkA:9 a=Kfa04o7uaCmm5LTuss4A:7 a=MGAC5Z7U-uuS2fYfJtyCmPaA-KwA:4 a=MSl-tDqOz04A:10 a=4vB-4DCPJfMA:10 X-CM-Score: 0.00 Received: from 72.196.195.196 by webmail.east.cox.net; Fri, 7 Aug 2009 9:04:18 -0400 Message-ID: <20090807090418.6LOT7.49197.imail@eastrmwml31> Date: Fri, 7 Aug 2009 9:04:18 -0400 From: To: java-user@lucene.apache.org Subject: Re: Why does this search succeed with web app, but not Luke? Cc: Ian Lea In-Reply-To: <8c4e68610908070321s2790c687md1a10feea81e208a@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Priority: 3 (Normal) Sensitivity: Normal X-Virus-Checked: Checked by ClamAV on apache.org Ian, I just re-confirmed that StandardAnalyzer is used in both my indexer app an= d in the query/search web app. The actual file paths look like: C:\lucene-devel\dat\xxxxxxxxxxxxxxxx.dat or C:\lucene-devel\data\testdir\\xxxxxxxxxxxxxxxx.dat For field "path", Luke shows: lucene data c devel dat testdir xxxxxxxxxxxxxxxxxxxxxxxxxxx . . zzzzzzzzzzzzzzzzzzzzzzzzzzzz where "xxxxxxxxxxxxxxxx" and "zzzzzzzzzzzzzzzzz" are the left part (to the = left of the ".") of filenames. So, it seems like you're correct, that what you're seeing is the opposite f= rom what I'm seeing :(?? Again, the actual code in my indexer has: doc.add(new Field("path", f.getPath(), Field.Store.YES, Field.Index.ANALYZ= ED)); (and again, the indexer uses StandardAnalyzer). Is that different from what you did in your "little" index test? Jim ---- Ian Lea wrote:=20 > It is a good general assumption that Luke is correct. >=20 > Can you confirm that you are using StandardAnalyzer everywhere, for > indexing and searching? This sort of issue is often caused by using > different analyzers. >=20 > What does Luke show as the indexed terms for path? In a little index > I've just created with StandardAnalyzer and file paths Luke is showing > xxx.yyy as a term and not xxx. The opposite to what you have. >=20 > There was a thread yesterday about acronyms which might be relevant. > As might writing a tiny self-contained program that indexes a few > paths and displays the terms that have been indexed and runs a few > searches. >=20 >=20 > -- > Ian. >=20 >=20 > On Fri, Aug 7, 2009 at 5:36 AM, wrote: > > Hi Phil, > > > > Well, kind of... but... > > > > Then, why, when I do the search in Luke, do I get the results I cited: > > > > xxxx =C2=A0=3D=3D> succeeds > > > > xxxx.yyy =C2=A0=3D=3D> fails (no results) > > > > I guess that I've been assuming that the search in Luke is "correct" an= d I've been using that to "test my understanding", but maybe that's an inva= lid assumption? > > > > Jim > > > > > > > > > > > > ---- Phil Whelan wrote: > >> Hi Jim, > >> > >> > As I said, based on the terms in Luke, I would have expected a web a= pp query on: > >> > > >> > path:file-1-2 > >> > > >> > to succeed, and a query on: > >> > > >> > path:file-1-2.dat > >> > to fail. > >> > > >> > But, instead both of those succeed when I do a web query. > >> > >> This query will also pass through the same (hopefully) Analyzer and > >> will be broken into terms. So the query will actually be for > >> "file-1-2" and "dat" where "file-1-2" is followed immediately by > >> "dat". > >> > >> In indexing the terms position is stored, so > >> "C:\dir1\dir2\file-1-1.dat" becomes... > >> [0] c > >> [1] dir1 > >> [2] dir2 > >> [3] file-1-1 > >> [4] dat > >> > >> "file-1-1" is followed by "dat", so there is a match. > >> > >> Does that make sense? > >> > >> Cheers, > >> Phil > >> > >> > > >> > Jim > >> > > >> > > >> > ---- ohaya@cox.net wrote: > >> >> Phil, > >> >> > >> >> Both my indexer and the webapp are basically from the Lucene demos,= the indexer starting with the IndexFiles.java demo code, so I think they'r= e both using the StandardAnalyzer. > >> >> > >> >> What appears in Luke, when I select "path" is just the filename par= t, without the extension, i.e., the "xxxx" part. > >> >> > >> >> That's why I said in my original post that I was kind of surprised = that doing a web query for "path:xxxx.yyy" succeeded, i.e, in the path fiel= d in the index, there is no "xxxx.yyy", just "xxxx". > >> >> > >> >> Jim > >> >> > >> >> ---- Phil Whelan wrote: > >> >> > Hi Jim, > >> >> > > >> >> > Are you using the same Analyzer for indexing and searching? xxxx.= yyy > >> >> > will be seem as a HOSTNAME by StandardAnalyzer and will keep it a= s one > >> >> > term, whereas another indexer might split this into 2 terms. This > >> >> > should not matter either way as long as you are using the same > >> >> > Analyzer for both indexing and searching. > >> >> > > >> >> > I would expect this to pass unless you are using NOT_ANALYZED, or= the > >> >> > WhitespaceAnalyzer, or something else that would not split on "/"= . > >> >> > =C2=A0 =C2=A0 path:xxxx.yyy > >> >> > > >> >> > In Luke, do you see 2 terms "xxxx" and "yyy", or just "xxxx.yyy",= or > >> >> > something else? > >> >> > > >> >> > Thanks, > >> >> > Phil > >> >> > > >> >> > On Thu, Aug 6, 2009 at 1:03 PM, wrote: > >> >> > > Hi, > >> >> > > > >> >> > > In my indexer app (based on the IndexFiles.java demo), I am add= ing the "path" field: > >> >> > > > >> >> > > =C2=A0 =C2=A0doc.add(new Field("path", f.getPath(), Field.Store= .YES, Field.Index.ANALYZED)); > >> >> > > > >> >> > > Per Luke, the full path (e.g., "c:\....\xxxx.yyy") gets parsed,= and one of the terms (again, per Luke) is "xxxx", i.e., the actual file na= me, but without the extension. > >> >> > > > >> >> > > Then, when I search with Luke for "path:xxxx", that succeeds, a= s expected, and when I search with Luke for "path:xxxx.yyy", that fails, as= expected. > >> >> > > > >> >> > > But, if I search using the demo web app, for "path:xxxx.yyy", i= t succeeds. > >> >> > > > >> >> > > Since the Luke search for "path:xxxx.yyy" fails, I don't unders= tand why the web app search for "path:xxxx.yyy" would succeed? > >> >> > > > >> >> > > Thanks, > >> >> > > Jim > >> >> > > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > >> For additional commands, e-mail: java-user-help@lucene.apache.org > >> > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > >=20 > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org