lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <>
Subject Re: Overview of Query Parsing API Stack? / Dismax parsing, new 1.4 parsing, etc.
Date Mon, 24 Aug 2009 18:09:08 GMT
Thanks Hoss and Yonik.

Hoss, you had a particluarly pertinent passage:
> ... because the normal Lucene QueryParser uses whitespace ...
> and breaks up the input on the whitespace boundaries
> before it ever passes those chunks ... to the analyzers

This is EXACTLY what the issue is.  At first I thought it was the result of
using dismax, but from what you said, I'm guessing it affects all queries.
And does somebody have a "worked" example of engineering around it.


I was surprised by your IBM comments, because based on what they had
presented at the meetup, I also thought it would be more "granular".  Have
you chatted with them to confirm?

Mark Bennett / New Idea Engineering, Inc. /
Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513

On Thu, Aug 20, 2009 at 7:16 PM, Chris Hostetter

> : Subject: Overview of Query Parsing API Stack? / Dismax parsing,
> :     new 1.4  parsing, etc.
> Oh, what i would give for time to sit and document in depth how some of
> this stuff works (assuming i first had time to verify that it really does
> work the way i think)
> The nutshell answer is that as far as solr (1.4) is concerned, the main
> unit of "query parsing" is a QParser ... lots of places in the code base
> may care about parsing differnet strngs for the purposes of producting a
> Query object, but ultimately they all use a QParser.
> QParsers are plugins that you can configure instances of in your
> solrcinfog.xml and assign names to.  by default, all of various pieces of
> code in solr that do any sort of query related parsing use some basic
> convention to pick a QParser by name -- so StandardRequestHandler uses the
> QParser named "lucene" for parsing the "q" param, while
> DisMaxRequestHandler uses a QParser named "dismax" for "q", and "func" for
> the "bf" param.  so if you wanted to make some change so that *any* code
> path anywhere attempting to use the lucene syntax got your custom query
> parsing logic, you could configure a QParser with the name "lucene" and
> override the default.
> The brilliantly confusing magic comes into play when strings to be parsed
> start with the "local params" syntax (ie: "{!foo a=f,b=z}blah blah" ...
> that tells the parsing code to override whatever QParser it would have
> used for that string, and to pass everything after the "}" charcter to the
> parser named "foo", with a=f and b=z added to the list of SolrParams it's
> already got (from the query string, or default params in solrconfig,
> etc...)
> For most types of queries, the QParser ultimately uses Lucenes
> "QueryParser" class, or some subclass of it (DisMaxQueryParser used by the
> DisMaxQPlugin is a subclass of QueryParser") and 9 times out of 10 if
> people want to customize query parsing without inventing a 100% new
> syntax, they also write a subclass.
> coming in Lucene 2.9 (which is what Solr 1.4 will use) is a completley new
> QueryParser framework, which (i'm told) is suppose to make it much easier
> to create custom query parser syntaxs, but i haven't had time to look at
> it to see what all hte fuss is about.  so in theory you could use it to
> implement a new QPlugin in SOlr 1.4.
> no matter how you ultimately implement code that goes from "String" to
> "Query" you have to be concerned about the type of data in the field that
> Query objects refers to (if it was lowercased at index time, you want to
> lowercase at query time, etc...).  Solr does it's best to help query
> parsers out by supporting an <analyer type="query"/> in the schema.xml so
> that the schema creator that specify how to "analyze" a piece of
> input when building queries, but depending on the query syntax it's not
> always easy to get the behavior you expect from a particular query parser
> / analyzer pair (This part of query parsing typically trips people up when
> dealing with multiword synonyms, or analyzers that don't tokenize on
> whitespace, because the normal Lucene QueryParser uses whitespace as part
> of it's markup, and breaks up the input on the whitespace boundaries
> before it ever passes those chunks of input to the analyzers)
> : But trying traipse through the code to get "the big picture" is a bit
> : involved.
> like i said: the world of query parsing in solr all revolves arround the
> QParser API ... if you want to make sense of it, start there, and work out
> in both directions.
> PS: please, please, please ... as you make progress on understanding these
> internals, feel free to plagerize this email as the starting point of a
> new wiki page documenting your understanding for others who come along
> with teh same question.
> -Hoss

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message