lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: product based term combination for BooleanQuery?
Date Mon, 09 Jul 2007 01:36:22 GMT

: At index time, I used a per document boost (over all fields) and a per
: field bost (over all documents). I can certainly factor out the first
: into a query boost, but I was under the impression that if I ever wanted
: to combine fields (eg to index all "name" "alias" and "title" data in a
: single "head" field) then I had to pre-boost the data prior to combining

whoa, whoa, WHOA! ... not at ALL ... I'm not sure how you got that
impression, but when combining differnet pieces of source data into single
field Lucene has no idea where those differnet peices come from --
boosting a "title" field has no impact whatsoever on a "head" field just
because you happen to put the same piece of text in both "title" and

furthermore, field boosts apply to the entire field value, if you are
making a "head" field containing some text you think of as title and some
text you think of as "name" you can't set a boost just on the "title" part
of the "head" field.

as i said -- loose those field boosts and you hsould see a *big*
improcement ... in general, i would advise against any attempt to combine
differnet ideas into a single field for the purpose of improving relevancy
... the only reason i would ever take something like a "title" and an
"author" and combine them into a single field is to make hte quering
simpler/faster, not in an attempt to improve relevancy ... query lots of
seperate fields using unique query time boosts.

: it. I tend to believe that these (short) fields contain more relevant
: information than (long) wikipedia articles or other documents.

: Should idf and tf take care of that short/long quality distinction? It
: sounds like you feel they should.

tf/idf will take care of recognizing that the word "John" is relaly
common, so it's not as significant to the query as "Bush" ... the
lengthNorm function of Similarity is what will help score fields better
then longer fields.

: I'll build an index without the per field boost and see if that produces
: improved results.

try the DisjunctionMaxQuery too .. particularly if you have multiword
queries.  the DisMaxQueryParser in solr thta i mentioned before can be
very handy.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message