lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrzej Bialecki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13336) maxBooleanClauses ignored; can result in exponential expansion of naive queries
Date Thu, 11 Apr 2019 09:33:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815251#comment-16815251
] 

Andrzej Bialecki  commented on SOLR-13336:
------------------------------------------

+1 to the change, this makes sense from the operational POV, too - set a safe global limit
and then further lower it down as necessary per collection, knowing it won't ever get worse
than the global limit no matter how sloppy your query parser is.

I think this should be backported to all active 8x branches, not sure about 7x.

A few minor comments:
 * `SolrConfig`: "log.warn("solrconfig.xml: <maxBooleanClauses> of {} is greater then
global limit of {}": "then" -> "than"
 * `TestSolrQueryParser` @BeforeClass: I wonder if this won't affect other unrelated tests
executing in the same JVM?
 * `CoreContainer`: "if (null != this.cfg.getBooleanQueryMaxClauseCount()) {" - this hurts
my eyes a little ;) could we please swap the sides?
 * `StreamExpressionTest`: "// use filter() to allow eing parsed": "eing" -> "being"

There are also a few small typos in the Ref Guide docs:
 * "whether those clauses where explicitly": "where" -> "were". Not sure if the second
part of the sentence doesn't need a verb...
 * "specify more clauses then this": "then" -> "than"
 * "per-collection limit is greater then": "then" -> "than"

> maxBooleanClauses ignored; can result in exponential expansion of naive queries
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-13336
>                 URL: https://issues.apache.org/jira/browse/SOLR-13336
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: query parsers
>    Affects Versions: 7.0, 7.6, master (9.0)
>            Reporter: Michael Gibney
>            Assignee: Hoss Man
>            Priority: Major
>         Attachments: SOLR-13336.patch, SOLR-13336.patch
>
>
> Since SOLR-10921 it appears that Solr always sets {{BooleanQuery.maxClauseCount}} (at
the Lucene level) to {{Integer.MAX_VALUE-1}}. I assume this is because Solr parses {{maxBooleanClauses}}
out of the config and applies it externally.
> In any case, when used as part of {{lucene.util.QueryBuilder.analyzeGraphPhrase}} (and
possibly other places?), the Lucene code checks internally against only the static {{maxClauseCount}}
variable (permanently set to {{Integer.MAX_VALUE-1}} in the context of Solr).
> Thus in at least one case ({{analyzeGraphPhrase()}}, but possibly others?), {{maxBooleanClauses}}
is having no effect. I'm pretty sure this is what's underlying the [issue reported here as
being related to Solr 7.6|https://mail-archives.apache.org/mod_mbox/lucene-solr-user/201902.mbox/%3CCAF%3DheHE6-MOtn2XRbEg7%3D1tpNEGtE8GaChnOhFLPeJzpF18SGA%40mail.gmail.com%3E].
> To summarize, users are definitely susceptible (to varying degrees of likely severity,
assuming no actual _malicious_ attack) if:
>  # Running Solr >= 7.6.0
>  # Using edismax with "ps" param set to >0
>  # Query-time analysis chain is _at all_ capable of producing graphs (e.g., WordDelimiterGraphFilter,
SynonymGraphFilter that has corresponding synonyms with varying token lengths.
> Users are _particularly_ vulnerable in practice if they have query-time {{WordDelimiterGraphFilter}}
configured with {{preserveOriginal=true}}.
> To clarify, Lucene/Solr 7.6 didn't exactly _introduce_ the issue; it only increased the
likelihood of problems manifesting (as a result of LUCENE-8531). Notably, the "enumerated
strings" approach to graph phrase query (reintroduced by LUCENE-8531) was previously in place
pre-6.5 – at which point it could rely on default Lucene-level {{maxClauseCount}} failsafe
(removed as of 7.0). This explains the odd "Affects versions" => maxBooleanClauses was
disabled at the Lucene level (in Solr contexts) starting with version 7.0, but the change
became more likely to manifest problems for users as of 7.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message