lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From javier muguruza <>
Subject identifier field as keyword or unindexed
Date Mon, 07 Mar 2005 17:49:09 GMT
Hi all,

We index our documents in the following way:

        doc = new Document();
        // mailid
        doc.add(Field.UnStored("body", textb));

mid is a unique identifier, and body contains long pieces of text to be indexed.
And later make searches on the body field, the mid allows us to find a
file on the filesystem with a compressed (and digitally signed)
version of the original body indexed.
Our way to work in a query in our app is this:
1. first we make a search in a db (for many different reasons) that
returns a number (from 0 to thousands) of mid
2. we use lucene to search for some text in many indexes, this returns
a second list of mid
3. we return the result as the intersection of both lists.

This is working fine right now, but wonder wether we are not using
lucene to the fullest, cause we could also store mid as a keyword
(instead of unindexed), and add the condition (AND mid==[any mid from
our step 1]) to the lucene query we run. My questions are:

1. Is there a limit in the number of conditions I can add to a query??
Sometimes we have 10 mids, other times we have thousands of them so we
would have to add: AND (mid:mid1 OR mid:mid2 ... OR mid:mid10000).
Probably there is a limit, and we could only apply the mid conditions
when the number or mids returned by step 1 is smaller than that limit?
2. As the mid is a unique identifier (I guest lucene does not care
about that right?), and the condition on the mid woudl be ANDed to the
text query conditions, will it be faster for lucene to look first in
the mid field and dont do the text lookup if the mid condition is not
fullfilled? I dont know wether I am clear enough...Will I get some
benefit on the queries by adding some additional conditions or the
cost of adding another field to index will not pay off? Maybe it
depends on the number of documents? Maybe it would be best to set mid
as a keyword just in case, and add it as conditions later if the
searches take too long?

thanks for any though on that

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message