lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13336) solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can result in exponential expansion of naive queries
Date Fri, 19 Apr 2019 18:01:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16822093#comment-16822093
] 

ASF subversion and git services commented on SOLR-13336:
--------------------------------------------------------

Commit 1c3d23e58a987e60a0af08b9fca2211908cf49d3 in lucene-solr's branch refs/heads/master
from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1c3d23e ]

SOLR-13336: fix CloudInspectUtil to use filter to eliminate risk of TooManyClausesException


> solrconfig.xml maxBooleanClauses ignored by programtic/rewrtten queries; can result in
exponential expansion of naive queries
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13336
>                 URL: https://issues.apache.org/jira/browse/SOLR-13336
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.0, 8.0
>            Reporter: Michael Gibney
>            Assignee: Hoss Man
>            Priority: Major
>             Fix For: 8.1, master (9.0)
>
>         Attachments: SOLR-13336.patch, SOLR-13336.patch, SOLR-13336.patch
>
>
> changes made in Solr 7.0 set the effective value of {{BoleanQuery.getMaxClauseCount}}
to {{Integer.MAX_VALUE-1}} and only impossed a restriction based on the (existing) solrconfig.xml
setting  at the Solr query parser level via a new utility helper method.l
> But this means programatically generated queries (either by low level lucene methods,
or by query re-writing) no longer had any safety valve to prevent (effectively) infinite expansion.
 This issue fixes this problem by:
> * restoring a default upper bound on {{BoleanQuery.getMaxClauseCount}} of 1024
> * introducing a new solr.xml level setting for configuring this upper bound:{noformat}
> <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
> {noformat}
> *NOTE* that this solr.xml limit is ahard upper bound, that superceeds the existing solrconfig.xml
setting, which has been left in place and still limits the size of user specified boolean
queries.  ie: solr.xml maxBooleanClauses >= solrconfig.xml maxBooleanClauses >= number
of clauses a user explicitly specifies in a query string; solr.xml maxBooleanClauses >=
numberr of clauses in an expanded/rewritten query
> {panel:title=original bug report}
> Since SOLR-10921 it appears that Solr always sets {{BooleanQuery.maxClauseCount}} (at
the Lucene level) to {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses {{maxBooleanClauses}}
out of the config and applies it externally.
> In any case, when used as part of {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and
possibly other places?), the Lucene code checks internally against only the static {{maxClauseCount}}
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), {{maxBooleanClauses}}
is having no effect. I'm pretty sure this is what's underlying the [issue reported here as
being related to Solr 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely severity,
assuming no actual _malicious_ attack) if:
>  # Running Solr >= 7.6.0
>  # Using edismax with "ps" param set to >0
>  # Query-time analysis chain is _at all_ capable of producing graphs (e.g., WordDelimiterGraphFilter,
SynonymGraphFilter that has corresponding synonyms with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time {{WordDelimiterGraphFilter}}
configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only increased the
likelihood of problems manifesting (as a result of LUCENE-8531). Notably, the "enumerated
strings" approach to graph phrase query (reintroduced by LUCENE-8531) was previously in place
pre-6.5 – at which point it could rely on default Lucene-level {{maxClauseCount}} failsafe
(removed as of 7.0). This explains the odd "Affects versions" => maxBooleanClauses was
disabled at the Lucene level (in Solr contexts) starting with version 7.0, but the change
became more likely to manifest problems for users as of 7.6.
> {panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message