lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexey Kozhemiakin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-7167) ANY operator synax - score only top matching term
Date Thu, 26 Feb 2015 15:05:08 GMT

    [ https://issues.apache.org/jira/browse/SOLR-7167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338498#comment-14338498
] 

Alexey Kozhemiakin commented on SOLR-7167:
------------------------------------------


So we called it edismaxplus:
 
1. This is a initial version which implements ANY operator logic and does not brake other
query parsers.
2. This is an implementation of the approach described in previous email. We have choosen
approach number 3:
 
Let’s still parse queries from left to right, but remove BooleanQueries when we have ANY-operator
and introduce DisjunctionMaxQueries in it’s place.
Query is parsed from left to right.
•         NOT sets the Occurs flag of the clause to it’s right to MUST_NOT
•         AND will change the Occurs flag of the clause to it’s left to MUST unless it
has already been set to MUST_NOT
•         AND sets the Occurs flag of the clause to it’s right to MUST
•         If the default operator of the query parser has been set to “And”: OR will
change the Occurs flag of the clause to it’s left to SHOULD unless it has already been set
to MUST_NOT
•         OR sets the Occurs flag of the clause to it’s right to SHOULD
•         ANY will not change the Occurs flag of the clause to it’s left, but it needs
to remove the Boolean query and create a DisjunctionMaxQuery in it’s place.
In the previous approach, things got quickly to complicated. The current grammar does not
in fact represent a Boolean logic. There is no Boolean logic grammar tree. It is read more
like a stream of tokens, left to right, and when you have AND – you change the Occurs flag
of the clause to it’s left to MUST unless it has already been set to MUST_NOT. And sets
the Occurs flag of the clause to it’s right to MUST.
You can read more about it here https://lucidworks.com/blog/why-not-and-or-and-not/
To make it all work, we would need to define that grammar in such a way, that OR takes two
operands, AND takes two operands, that there is a real tree structure in it. Then we can introduce
another operator – ANY :
<AnyOp> ::> 
<Clause (<field>)> <ANY> <Clause(<field>)> (<ANY> <Clause(<field>)>)*
This might introduce quite a few surprises though. We would need to make sure, that even though,
parsing is different, the end result stays the same for operators AND, OR. This can also take
simply too much time to implement correctly.
The current patch seems to solve all addressed issues like different values of mm parameter,
many query fields in edismax's qf, incorrect query syntax. We also don't have to deal with
different cord factors, as we will be extending edismax query parser.
 
With this Jar we have addressed all the issues mentioned above. No other parsers are broken
etc. The behaviour of ANY operator is consistent with the behaviour of AND and OR operators
in existing parser; it is parsed from left to right and has similar possessive behavior as
AND operator - the left value is captured and packed into DisjunctionMaxQuery like on the
following example:
 
{!edismaxplus}disk ANY cd ANY dvd
 
becomes
 
(+(DisjunctionMaxQuery((((text:disk) | (text:cd)) | (text:dvd)))))/no_coord
 
 
Note that when there are multiple ANY operators in chain the lvalue with DisjunctionMaxQuery
will be treated as subquery (such behavior is desired for compatibility with various subqueries
that can occur as R or L value and the scoring will work as designed because
 
max(max(a,b),c) = max(a,b,c)
 
3. To make future maintenance easier (eg. solr version upgrade) the parser plugin would require
some additional work. For now it is directly based on existing edismax parser codebase with
minimal modification to make it work with our code - the result is that we have many functionalities
extracted from mainline and injected into plugin (the base edismax implementation, the whole
query parser, etc.). To improve it we need to create extension points (as there are none)
in existing edismax parser and pass them as a patch to community, the whole implementation
of ANY operator should be based solely on such extension points with only stub parser plugin
on top to distinguish between base edismax and edismaxplus.


> ANY operator synax - score only top matching term
> -------------------------------------------------
>
>                 Key: SOLR-7167
>                 URL: https://issues.apache.org/jira/browse/SOLR-7167
>             Project: Solr
>          Issue Type: Improvement
>          Components: query parsers
>    Affects Versions: 5.0
>            Reporter: Alexey Kozhemiakin
>
> When we query
> (<term A> OR <term B> OR <term C> OR <term D>)
> and in case a document contains 2 or more of these terms: only the highest scoring term
should contribute to the final relevancy score; possibly lower scoring  terms should be discarded
from the scoring algorithm.
> Ideally I'd like an operator like ANY:
> (<term A> ANY <term B> ANY <term C> ANY <term D>)
> that has the purpose: return documents, sorted by the score of the highest scoring term.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message