lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Isaac Hebsh <isaac.he...@gmail.com>
Subject Re: Prevention of heavy wildcard queries
Date Sun, 02 Jun 2013 17:09:51 GMT
Hi everyone.

I came across another need for term extraction: I want to find pairs of
words that appear in queries together. All of the "clustering" work is
ready. and the only hole is how to get the basic terms from the query.

Nobody tried it before? There is no clean way to do it?


On Tue, May 28, 2013 at 7:08 AM, Isaac Hebsh <isaac.hebsh@gmail.com> wrote:

> I don't want to affect on the (correctness of the) real query parsing, so
> creating a QParserPlugin is risky.
> Instead, If I'll parse the query in my search component, it will be
> detached from the real query parsing, (obviously this causes double
> parsing, but assume it's OK)...
>
>
> On Tue, May 28, 2013 at 3:52 AM, Roman Chyla <roman.chyla@gmail.com>wrote:
>
>> Hi Issac,
>> it is as you say, with the exception that you create a QParserPlugin, not
>> a
>> search component
>>
>> * create QParserPlugin, give it some name, eg. 'nw'
>> * make a copy of the pipeline - your component should be at the same
>> place,
>> or just above, the wildcard processor
>>
>> also make sure you are setting your qparser for FQ queries, ie.
>> fq="{!nw}foo"
>>
>>
>> On Mon, May 27, 2013 at 5:01 PM, Isaac Hebsh <isaac.hebsh@gmail.com>
>> wrote:
>>
>> > Thanks Roman.
>> > Based on some of your suggestions, will the steps below do the work?
>> >
>> > * Create (and register) a new SearchComponent
>> > * In its prepare method: Do for Q and all of the FQs (so this
>> > SearchComponent should run AFTER QueryComponent, in order to see all of
>> the
>> > FQs)
>> > * Create
>> org.apache.lucene.queryparser.flexible.core.StandardQueryParser,
>> > with a special implementation of QueryNodeProcessorPipeline, which
>> contains
>> > my NodeProcessor in the top of its list.
>> > * Set my analyzer into that StandardQueryParser
>> > * My NodeProcessor will be called for each term in the query, so it can
>> > throw an exception if a (basic) querynode contains wildcard in both
>> start
>> > and end of the term.
>> >
>> > Do I have a way to avoid from reimplementing the whole
>> StandardQueryParser
>> > class?
>> >
>>
>> you can try subclassing it, if it allows it
>>
>>
>> > Will this work for both LuceneQParser and EdismaxQParser queries?
>> >
>>
>> this will not work for edismax, nothing but changing the edismax qparser
>> will do the trick
>>
>>
>> >
>> > Any other solution/work-around? How do other production environments of
>> > Solr overcome this issue?
>> >
>>
>> you can also try modifying the standard solr parser, or even the JavaCC
>> generated classes
>> I believe many people do just that (or some sort of preprocessing)
>>
>> roman
>>
>>
>> >
>> >
>> > On Mon, May 27, 2013 at 10:15 PM, Roman Chyla <roman.chyla@gmail.com>
>> > wrote:
>> >
>> > > You are right that starting to parse the query before the query
>> component
>> > > can get soon very ugly and complicated. You should take advantage of
>> the
>> > > flex parser, it is already in lucene contrib - but if you are
>> interested
>> > in
>> > > the better version, look at
>> > > https://issues.apache.org/jira/browse/LUCENE-5014
>> > >
>> > > The way you can solve this is:
>> > >
>> > > 1. use the standard syntax grammar (which allows *foo*)
>> > > 2. add (or modify) WildcardQueryNodeProcessor to dis/allow that case,
>> or
>> > > raise error etc
>> > >
>> > > this way, you are changing semantics - but don't need to touch the
>> syntax
>> > > definition; of course, you may also change the grammar and allow only
>> one
>> > > instance of wildcard (or some combination) but for that you should
>> > probably
>> > > use LUCENE-5014
>> > >
>> > > roman
>> > >
>> > > On Mon, May 27, 2013 at 2:18 PM, Isaac Hebsh <isaac.hebsh@gmail.com>
>> > > wrote:
>> > >
>> > > > Hi.
>> > > >
>> > > > Searching terms with wildcard in their start, is solved with
>> > > > ReversedWildcardFilterFactory. But, what about terms with wildcard
>> in
>> > > both
>> > > > start AND end?
>> > > >
>> > > > This query is heavy, and I want to disallow such queries from my
>> users.
>> > > >
>> > > > I'm looking for a way to cause these queries to fail.
>> > > > I guess there is no built-in support for my need, so it is OK to
>> write
>> > a
>> > > > new solution.
>> > > >
>> > > > My current plan is to create a search component (which will run
>> before
>> > > > QueryComponent). It should analyze the query string, and to drop the
>> > > query
>> > > > if "too heavy" wildcard are found.
>> > > >
>> > > > Another option is to create a query parser, which wraps the current
>> > > > (specified or default) qparser, and does the same work as above.
>> > > >
>> > > > These two options require an analysis of the query text, which
>> might be
>> > > an
>> > > > ugly work (just think about nested queries [using _query_], OR even
>> a
>> > lot
>> > > > of more basic scenarios like quoted terms, etc.)
>> > > >
>> > > > Am I missing a simple and clean way to do this?
>> > > > What would you do?
>> > > >
>> > > > P.S. if no simple solution exists, timeAllowed limit is the best
>> > > > work-around I could think about. Any other suggestions?
>> > > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message