lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: "Advanced" query language
Date Thu, 22 Dec 2005 00:20:19 GMT

I finally got a chance to look at this code today (the best part about the
last day before vacation, is no one expects you to get anything done, so
you can ignore your "real work" and spend time on things that are more
important in the long run) and while I still havne't wrapped my head
arround all of it, I wanted to share my thoughts so far on the API...

1) I aplaud the plugable nature of your solution. Looking at the Test
Case, it is easy to see exactly how a service provider could
do things like override the behavior of a <PhraseQuery> to be implimented
as a SpanQuery without their clients being affected at all.  Kudos.

2) Digging into what was involved in writting an ObjectBuilder, I found
the api somewhat confusion.  I was reminded of this exchange you had with

: > While SAX is fast, I've found callback interfaces
: > more difficult to
: > deal with while generating nested object graphs...
: > it normally
: > requires one to maintain state in stack(s).
: I've gone to some trouble to avoid the effects of this
: on the programming model.

As someone who feels very comfortable with Lucene, but has no
practical experience with SAX, I have to say that I don't really feel like
the API has a very clean seperation from SAX.

I think that the ideal API wouldn't require people writing ObjectBuilders
to know anything about sax, or to ever need to import anything from
org.xml.** or javax.xml.**

3) While the *need* to maintaing/pass state information should be avoided.
I can definitely think of uses for this framework that may *want* to pass
state information -- both down to the ObjectBuilders that get used in
inner nodes, as well as up to wrapping nodes, and there doesn't seem to be
an easy way to that.  (it could just be my lack of SAX knowledge though)

The best example i can give is if someone (ie: me) wanted to use this
framework to allow boolean queries to be written like this...

      <TermQuery occurs="mustNot" field="contents" value="mustNot"/>
      <UserInput occurs="must">"a phrase" fuzzy~</UserInput>

...i want to be able to write an "BooleanClauseWrapperObjectBuilder" that
can be wrapped around any other ObjectBuilder and will return whatever
object it does, but will also check for and "occurs" attribute, and put
that in a state bucket somewhere that the BooleanQuery has access to it
when adding the Query it gets back.

Going the ooposite direction, I'd like to be able to have tags that set
state which is accesible to descendent tags (even if hte tags in teh
middle don't know anything about that bit of state.  for example:
specifying how much slop should be used by default in phrase queries...

   <StateModifier defaultPhraseSlop="100">
         <PhraseQuery occurs="mustNot" field="contents">
            How Now Brown Cow?
   <StateModifier defaultPhraseSlop="100">

I haven't had a chance to try implimenting this, but at a high level, it
seems like all of this should be possible and still easy to use.
Here's a real rough cut at what i've had floating arround in the back
of my head (I'm doing this straight into email, pardon any typo's or
psuedo code) ...

/** could be implimented with SAX, or DOM, or Pull */
public interface LuceneXmlParser {
    /** this method will call setParser(this) on each handler */
    public void registerHandler(String tag, LuceneXmlHandler h);
     primary method for clients, parses the xml and calls processNode
     on the root node
    public Query parse(InputStream xml);
     dispatches to the appropriate handler's process method based
     on the Node name, may be called by handlers for recursion of children
    public Query processNode(LuceneXmlNode n, State s)
public interface LuceneXmlHandler {
    public void setParser(LuceneXmlParser p)
     should return a Query that corrisponds to the specified node.
     may rea/modify state in any way it wants ... it is recommended that
     all implimenting methods wrap their state before passing it on when
     processing children.
    public Query process(LuceneXmlNode n, State s)
 A State is a stack frame that can delegate read operations to another
 State it wraps (if there is one).  but it cannot delegate modifying
 Classes implimenting State should provide a constructor that takes
 another State to wrap.
public interface State extends Map<String,Object> {
    for callers that wnat to know what's in the immeidate stack
    frame without any delegation
   public Map<String,Object> getOuterFrame();
   /* should return a new state that wraps the current state */
   public State wrapCurrentState();
/** a very simple api arround the most basic xml concepts */
public interface LuceneXmlNode {
   public CharSequence getNodeName();
   public Map<String,String> getAttributes()
   public CharSequence getBodyText();
   public Iterator<LuceneXmlNode> getChildren()
/** an example handler for TermQuery */
public class BooleanQueryHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
     Map<String,String> attrs = getAttributes()
     return new TermQuery(new Term(attrs.get("field"),attrs.get("value"))
/** an example handler for BooleanQuery */
public class BooleanQueryHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
     BooleanQuery r = new BooleanQuery;
     Integer minShouldMatch = new Integer(n.getAttributes().get("minShouldMatch"));
     for (LuceneXmlNode kid : n.getChildren()) {
        kidState = s.wrapCurrentState();
        Query b = p.processNode(kid,kidState);
        Occurs o = Occurs.MAY;
        if (kidState.getOuterFrame().contains("occurs")) {
            o = kidState.getOuterFrame().get();
     return r;
 an example handler that can make wrap any other handler and give it
 BooleanClause.Occurs awareness
public class BooleanClauseWrapperHandler impliments LuceneXmlHandler {
   LuceneXmlParser p;
   LuceneXmlHandler inner;
   public BooleanClauseWrapperHandler(LuceneXmlHandler i) { inner = i; }
   public void setParser(LuceneXmlParser q) { p=q; }
   public Query process(LuceneXmlNode n, State s) {
      Query q = i.process(n, s)
      if (n.getAttributes().contains("occurs")) {
        /* glossing over string parsing to object construction here */
      return q;

...does that make sense?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message