lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
Date Thu, 10 Aug 2017 09:46:01 GMT


ASF subversion and git services commented on LUCENE-7914:

Commit 25ee0141c19b7f6156c2c439a59fad97f383b8cc in lucene-solr's branch refs/heads/branch_7_0
from [~jimczi]
[;h=25ee014 ]

LUCENE-7914: AnalyzingSuggesterTest#testRandomRealisticKeys: trim big titles to make sure
that they can pass the max recursion level in Operations#topsortState.

> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>                 Key: LUCENE-7914
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>             Fix For: master (8.0), 7.1
>         Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch,
> When creating an automaton from a regexp, some operators can create more states than
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already determinized
so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or Operations.topoSortStates are
recursive and the maximum level of recursion depends on the longest path in the automaton.
So a large automaton like above can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will always be
adversarial cases where a large automaton is created from a small input so I think we should
also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates
in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger
than 1000 would throw an IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that a repeat
operator can create and throw a TooComplex..Exception when this limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and
uses this as an optimization for PrefixQuery that uses infinite automatons only.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message