lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Technology Preview of new Lucene QueryParser
Date Thu, 11 Jan 2007 23:18:56 GMT

> : > so do you convert A ! B ! C into a three clause boolean query, or a two
> : > clause BooleanQuery that contains another two clause BooleanQuery?
> : >
> : It becomes a three clause boolean query...would there be a difference in
> : scoring? I assumed not and it used to make a boolean that contained
> : another boolean...these days it checks to see if its in a chain of the
> : same operator and makes only one boolean.
>
> there is in fact a difference in score ... a big difference depending
> on how the coordFactor comes into play.  your three-clause approach makes
> sense to me as the "right" approach, but your "in a chain of the same
> operator" comment scares me ... how does "A | B | C ! D ! E" get parsed?
> I would assume it should result in the QueryParser equivilent of
> "A B C -D -E" ... is there any way to produce a the same underlying
> BooleanQuery using your syntax?
>   
This sounds troubling to me now :) I may need to clear up my 
understanding of this and rework the parser:
"A | B | C ! D ! E" wold get parsed as allFields:a allFields:b 
(+allFields:c -allFields:d -allFields:e)
This is because ! binds tighter than |...
Sounds like I need to bone up on how I thought this query would operate. 
I set up this logic back when I was new to Lucene and have not 
considered it since. Seems as though the hits will be right but perhaps 
the scoring will not be correct?
> : Just a boolean right now. I will have to look at DisjuntionMaxQuery.
> : Currently its just a boolean: +field1:foo +field2:foo
>
> hmmm... so field1,field2(foo) requires that foo match on both field1 and
> field2, even if you've used it in this context...
>
>                 fieldX(bar) | field1,field2(foo)
>
> ...it seems like a shortcut for "match on foo in ANY of the following
> fields" would be needed in more cases then a shortfut for "match on foo in
> ALL of the following fields"
>   
I was mistaken: it is actually field1:foo field2:foo. OR instead of AND. 
Sorry about that. Obviously it looks like I should be looking into 
DisjunctionMaxQuery instead though.
> : > incidently: what was there a motivating factor behind the mixed use of
> : > both ~ and : to denote slop?
> : >
> : ':' is for slop on a phrase query. "the car is burning so get out":2
> : will allow for each word to be within 2.
> : '~' is a binary operator...mark ~4 postman...or say: (mark ~5 (horse &
> : car) ~6 tom brady | "hard knocks dude":3) ~6 garbage
> :
> : Phrase slop could be specified with the '~' op too: the ~2 car ~2 is
> : ~burning ~2 so ~2 get ~2 out : but that is a pain in the butt.
>
> you kind of lost me there ... i get that ~ is a binary operator, but in
> both cases the intent is to say "these words must appear near eachother"
> ...s oi'm wondering why you cose to use "hard knocks dude":3 instead of
> "hard knocks dude"~3 .... oh wiat, i think i get it ... was it to
> eliminate ambiguity of something like ("hard knocks dude" ~3 foo) ...
> is the whitespace arround binary operators optional?
>   
Eliminating ambiguity was the intention. You think they should be the same?
The whitespace is not optional. I chose this route so that you would be 
able to query m&m without escaping the &..instead you would use m & m 
for an AND search.
> -Hoss
>   
I can't thank you enough even for this brief exchange Hoss. You are a 
tremendous help. I will be using this system in a production environment 
and need it to be perfect before I am done.
The booleanquery question you brought up seems very troubling.

- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message