lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: "Advanced" query language
Date Sat, 03 Dec 2005 02:03:36 GMT
On Dec 2, 2005, at 10:03 AM, mark harwood wrote:
> There seems to be a growing gap between Lucene
> functionality and the query language offered by
> QueryParser (eg no support for regex queries, span
> queries, "more like this", filter queries,
> minNumShouldMatch etc etc).

At least with a couple of these it would be sensible to subclass  
QueryParser and override some getters to create other types of  
queries.  For example, if you need ordered sloppy phrase queries you  
could create a SpanNearQuery instead of a PhraseQuery.  Likewise with  
RegexQuery instead of WildcardQuery.

Question - since when is "more like this" a Query?  Should it be?

Your points below are well taken though....

> Closing this gap is hard when:
> a) The availability of Javacc+Lucene skills is a
> bottleneck

job security?!  :)

I've been doing a lot of JavaCC work this year, and it has been a  
humbling learning curve, and I barely feel capable with it.

One interesting project I just came across is JParsec: http:// - perhaps this could be a much simpler way than   
using JavaCC.

> b) The syntax of the query language makes it difficult
> to add new features eg rapidly running out of "special
> characters"

This is the biggest issue of all.  What do humans want to type in in  
order to achieve sophisticated queries?

Apple has it pretty nicely implemented with additive builders (such  
as with Finder, Mail rules, and smart playlists in iTunes) but they  
don't support nested expressions rather only "all" or "any" of the  

> I don't think extending the existing query
> parser/language is necessarily useful and I see it
> being used purely to support the classic "simple
> search engine" syntax.

I concur. Tacking more into QueryParser is not going to make most  
users happy.  I think there may be too many bells and whistles in it  

> Unfortunately the fall-back position for applications
> which require more complex queries is to "just write
> some Java code to instantiate the Query objects
> programmatically."

I've not found a generalization of how queries are entered into the  
system across the applications I've worked on, though.  Every query  
interface has been custom.

> This is OK but I think there is
> value in having an advanced search syntax capable of
> supporting the latest Lucene features and expressed in
> XML. It's worth considering why it's useful to have a
> String-representable form for queries:
> 1) Queries can be stored eg in audit logs or "saved
> queries" used for tasks like auto-categorization
> 2) Clients built in languages other than Java can
> issue queries to a Lucene server
> 3) I can decouple a request from the code that
> implements the query when distributing software e.g my
> applet may not want Lucene dragging down to the client

This is an interesting proposal, and one that has a lot of merit in  
how you've explained it.

> We can potentially use XML in the same way ANT does
> i.e. a declarative way of invoking an extensible list
> of Java-implemented features.

I've told many developers that the answer to almost all Java  
questions lies within the source code to Ant :)

> A query interpreter is
> used to instantiate the configured Java Query objects
> and populates them with settings from the XML in a
> generic fashion (using reflection) eg:
> ....
>    <MoreLikeThis minNumberShouldMatch="3"
> maxQueryTerms="30">

We're back to MoreLikeThis - it's not currently a Query subclass.   
How do you envision this sort of thing fitting in if it's not a Query?

> Do people feel this would be a worthwhile endeavour?

I think a way to get a query to/from XML is a good one.  Perhaps the  
XML serialization feature of JDK 1.4 (or is it 1.5?) is sufficient  
for this?  Maybe not though - and there are plenty of handy helpers  
from just doing raw reflection tricks like Ant, to using something  
like Digester or Castor.  I wouldn't recommend reinventing the XML de/ 
serialization aspect of this.

> I'm not sure if enough people feel pain around the
> points 1-3 outlined above to make it worth pursuing.

I don't see where I would use this capability just yet, but I do see  
it as useful in the contexts you provided.

I'd also be interested in effort towards an Apple-like query builder.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message