lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From pagod <>
Subject Preliminary, fundamental question about the demo
Date Mon, 08 Sep 2008 08:16:37 GMT


I just started with Lucene today, and the first thing I did was try out the
small demo. I followed the instructions in "Getting started - Building and
Installing the Basic Demo" by the letter -- I downloaded the JAR files
(2.3.2), unpacked and launched the indexer on the src directory -- worked
fine, indexed all java files in the directory and its subdirectories. I
didn't try to search for a swearword, but I did try to search for "vector".
The fact that I got only one result whereas the demo says I should get a
bunch of them isn't really the problem. The problem is that I got only one
result although the word "vector" appears in TWO documents:
(I checked that with grep)

When I enter my query, I get a very clear answer: 
Enter query:
Searching for: vector
1 total matching documents
1. src/demo/org/apache/lucene/demo/

grep's version:
[silenos:apache/lucene/demo] veda> pwd
[silenos:apache/lucene/demo] veda> grep -i vector * */*   * are all identical, then single norm vector may be
shared. */
html/  private java.util.Vector jj_expentries = new
[silenos:apache/lucene/demo] veda>

So my question is a very easy one: what happened? Is there a special
processing for java files, like for HTML documents, which leaves comments
out? Is that a bug only in the "demo" part of this small program (this would
be surprising, as other queries seem to be working fine)? Is there actually
a way I can check the content of my index -- what files were actually
indexed, or search for a file in particular? A bit like a field search, but
with the URI of the file itself (though I think I read this is
implementation-dependent, that means one could do it programmatically, but
it's not in the demo, right?)?

Anyway, thx for your answers. I hope there is a good one to this question,
cos I'd feel rather deceived if a search engine so obviously ignores some

View this message in context:
Sent from the Lucene - General mailing list archive at

View raw message