lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oliver Kaleske <Oliver.Kale...@ptvgroup.com>
Subject Re: null Query from MultiFieldQueryParser.getFieldQuery
Date Tue, 04 Oct 2016 13:34:31 GMT
Hi Steve,

thanks for the fix.

I locally applied the patch on branch_6_2 (because that is closest to my current 6.2.1 dependency)
and built Lucene from there.
Using the outcome in my application, the problem observed there is fixed.

Best regards,
Oliver

-----Ursprüngliche Nachricht-----
Von: Steve Rowe [mailto:sarowe@gmail.com] 
Gesendet: Freitag, 30. September 2016 21:48
An: java-user@lucene.apache.org
Cc: Oliver Kaleske <Oliver.Kaleske@ptvgroup.com>
Betreff: Re: null Query from MultiFieldQueryParser.getFieldQuery

Hi Oliver,

Thanks for reporting and for the analysis, this is a bug.

See <https://issues.apache.org/jira/browse/LUCENE-7472>, where I’ve put up a patch
with a fix that treats all non-BooleanQuery queries opaquely (like TermQuery), and adds a
test for the SynonymQuery case that fails without the patch and succeeds with it.

If you could test the patch, that would be great.

--
Steve
www.lucidworks.com

> On Sep 29, 2016, at 11:24 AM, Adrien Grand <jpountz@gmail.com> wrote:
> 
> I'm not very familiar with this part of the code base so I could easily
> overlook something. Maybe you can open a JIRA and attach a minimal test
> case that reproduces the issue?
> 
> Le lun. 19 sept. 2016 à 13:48, Oliver Kaleske <Oliver.Kaleske@ptvgroup.com>
> a écrit :
> 
>> Hi,
>> 
>> in updating Lucene from 6.1.0 to 6.2.0 I came across the following:
>> 
>> We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom
>> type of Query, which calls getFieldQuery() on its base class (MFQP).
>> For each of its search fields, this method has a Query created by calling
>> getFieldQuery() on QueryParserBase.
>> Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which
>> depending on the number of tokens (etc.) decides what type of Query to
>> return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.
>> 
>> Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending
>> on the type of Query returned: for a TermQuery or a BooleanQuery, its value
>> will in general be nonzero, clauses are created, and a non-null Query is
>> returned.
>> However, other Query subclasses result in maxTerms=0, an empty list of
>> clauses, and finally null is returned.
>> 
>> To me, this seems like a bug, but I might as well be missing something.
>> The comment "// happens for stopwords" on the return null statement,
>> however, seems to suggest that Query types other than TermQuery and
>> BooleanQuery were not considered properly here.
>> I should point out that our custom MFQP subclass so far does some rather
>> unsophisticated tokenization before calling getFieldQuery() on each token,
>> so characters like '*' may still slip through. So perhaps with proper
>> tokenization, it is guaranteed that only TermQuery and BooleanQuery can
>> come out of the chain of getFieldQuery() calls, and not handling
>> (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?
>> 
>> The code in MFQP.getFieldQuery dates back to
>> LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to
>> control whether to split on whitespace prior to text analysis.  Default
>> behavior remains unchanged: split-on-whitespace=true.
>> (06 Jul 2016), when it was substantially expanded.
>> 
>> Best regards,
>> Oliver
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
>> 

Mime
View raw message