lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mikhail Khludnev <mkhlud...@griddynamics.com>
Subject Re: From a high level query call, tell Solr / Lucene to automatically apply a leaf operator?
Date Mon, 04 Mar 2013 08:54:36 GMT
Mark,

it's there for ages
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/queryParser/core/package-summary.html
You are welcome!


On Mon, Mar 4, 2013 at 2:42 AM, Mark Bennett <mbennett@ideaeng.com> wrote:

> Hi Mikhail,
>
> Thanks for the links, looks like interesting stuff.
>
> Sadly this project is stuck in 3.x for some very thorny reasons...
>
> Googling around, looks like this might be strictly 4.x...
>
> On Mon, Feb 25, 2013 at 12:21 PM, Mikhail Khludnev <
> mkhludnev@griddynamics.com> wrote:
>
> > Mark,
> >
> > AFAIK
> >
> >
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/core/package-summary.htmlis
> > a convenient framework for such juggling.
> > Please also be aware of the good starting point
> >
> >
> http://lucene.apache.org/core/4_0_0-ALPHA/queryparser/org/apache/lucene/queryparser/flexible/standard/package-summary.html
> >
> >
> >
> > On Sun, Feb 24, 2013 at 11:33 AM, Mark Bennett <mbennett@ideaeng.com>
> > wrote:
> >
> > > Scenario:
> > >
> > > You're submitting a block of text as a query.
> > >
> > > You're content to let solr / lucene handing query parsing and
> > tokenziation,
> > > etc.
> > >
> > > But you'd like to have ALL eventually produced leaf-nodes in the parse
> > tree
> > > to have:
> > > * Boolean .MUST (effectively a + prefix)
> > > * Fuzzy match of ~1 or ~2
> > >
> > > In a simple application, and if there were no punctuation, you could
> > > preprocess the query, effectively:
> > > * split on whitespace
> > > * for t in tokens: t = "+" + t + "~2"
> > >
> > > But this is ugly, and even then I think things like stop words would be
> > > messed up:
> > > * OK in Solr:   the chair    (it can properly remove "the")
> > > * But if this:    +the~2  +chair~2   (I'm not sure this would work)
> > >
> > > Sure, at the application level you could also remove the stop words in
> > the
> > > "for t in tokens" loop, but then some other weird case would come up.
> > > Maybe one of the field's analyzers has some other token filter you
> forgot
> > > about, so you'd have to bring that logic forward as well.
> > >
> > > (Long story of why I'd want to do all this... and I know people think
> > > adding ~2 to all tokens will give bad results anyway, trying to fix
> > > inherited code that can't be scrapped, etc)
> > >
> > > --
> > > Mark Bennett / New Idea Engineering, Inc. / mbennett@ideaeng.com
> > > Direct: 408-733-0387 / Main: 866-IDEA-ENG / Cell: 408-829-6513
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> >  <mkhludnev@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhludnev@griddynamics.com>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message