lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Morus Walter <>
Subject Re: QueryParser refactoring
Date Tue, 08 Mar 2005 07:29:04 GMT
Erik Hatcher writes:
> > Your changes look great in general, though I find some issues:
> >
> > 1) 'stop OR stop AND stop' where stop is a stopword gives a parse 
> > error:
> > Encountered "<EOF>" at line 1, column 0.
> > Was expecting one of:
> >     <NOT> ...
> > ...
> I think  you must have tried this in a transient state when I forgot to 
> check in some JavaCC generated files.  Try again.  This one now returns 
> an empty BooleanQuery.
I'm a bit puzzled, since I called javacc myself, so generated files should
not matter, but if it's fixed, I don't care about what went wrong.

> > 2) Single term queries using +/- flags are parse to a query without 
> > flag
> > +a -> a
> Hmmm.... this is a debatable one.  It's returning a TermQuery in this 
> case for "a".  Is that appropriate?  Or should it return a BooleanQuery 
> with a single TermQuery as required?
I'd prefer, if query parser parses queries created by query.toString()
to the same query. But that's just a nice to have.

> I think having it optimized to a TermQuery makes the most sense.  
> Though, putting it in a BooleanQuery does make this next one simpler...
> > -a -> a
> > While this doesn't make a difference for +a it's a bit strange for -a,
> > OTOH -a isn't a usable query anyway.
> Oops... yeah, you're right.  If its a single clause right now it 
> doesn't wrap in a BooleanQuery and thus does not take into account the 
> modifier +/-/NOT.   But as you say, this is a bogus query anyway.  I 
> guess the right thing to do is wrap both the +a query as above and the 
> -a query into a BooleanQuery with the modifier set appropriately.
The question how to handle BooleanQueries, that contain prohibited terms
only, is a question on it's own.
In my fix I choose to silently drop these queries. Basically because it's
effectivly dropped during querying anyway.
In an application, I handled this by dropping the query and notifying the
user, that some part of the query could not be handled and was ignored.

> > 3) a OR NOT b parses to 'a -b' which is the same as 'a AND NOT b'
> >    IMHO `a OR NOT b' should be `a OR (NOT b)' though lucene cannot 
> > search
> >    that. Maybe it should raise an error...
> Actually it parses like this:
> 	a OR NOT b -> a -b
> 	a AND NOT b -> +a -b
> So they are slightly different, though the effect will be the same.
> >    a OR NOT b AND c (parsed to a -(+b +c)) should IMHO be parsed to `a 
> > (-b +c)'
> Ah, ok.... so NOT gets much higher precedence than I'm currently giving 
> it.  That might take me a while to achieve, but I'll give it a shot.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message