lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Bennett <mbenn...@ideaeng.com>
Subject Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Date Sun, 03 Mar 2013 22:42:58 GMT
Hi Mikhail,

Thanks for the links, looks like interesting stuff.

Sadly this project is stuck in 3.x for some very thorny reasons...

Googling around, looks like this might be strictly 4.x...

On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev <
mkhludnev@griddynamics.com> wrote:

> Mark,
>
> AFAIK
>
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
> a convenient framework for such juggling.
> Please also be aware of the good starting point
>
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
>
>
>
> On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett <mbennett@ideaeng.com>
> wrote:
>
> > Scenario:
> >
> > You're submitting a block of text as a query.
> >
> > You're content to let solr / lucene handing query parsing and
> tokenziation,
> > etc.
> >
> > But you'd like to have ALL eventually produced leaf-nodes in the parse
> tree
> > to have:
> > * Boolean .MUST (effectively a + prefix)
> > * Fuzzy match of ~1 or ~2
> >
> > In a simple application, and if there were no punctuation, you could
> > preprocess the query, effectively:
> > * split on whitespace
> > * for t in tokens: t = "+" + t + "~2"
> >
> > But this is ugly, and even then I think things like stop words would be
> > messed up:
> > * OK in Solr:   the chair    (it can properly remove "the")
> > * But if this:    +the~2  +chair~2   (I'm not sure this would work)
> >
> > Sure, at the application level you could also remove the stop words in
> the
> > "for t in tokens" loop, but then some other weird case would come up.
> > Maybe one of the field's analyzers has some other token filter you forgot
> > about, so you'd have to bring that logic forward as well.
> >
> > (Long story of why I'd want to do all this... and I know people think
> > adding ~2 to all tokens will give bad results anyway, trying to fix
> > inherited code that can't be scrapped, etc)
> >
> > --
> > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <mkhludnev@griddynamics.com>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message