lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Ferenczi (JIRA)" <>
Subject [jira] [Updated] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
Date Tue, 01 Aug 2017 12:27:00 GMT


Jim Ferenczi updated LUCENE-7914:
    Attachment: LUCENE-7914.patch

It's not, it's just a copy/paste from another test in the class. I pushed another iteration
that removes the serialization/deserialization from TestRegExp completely.

> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>                 Key: LUCENE-7914
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>         Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch
> When creating an automaton from a regexp, some operators can create more states than
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already determinized
so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or Operations.topoSortStates are
recursive and the maximum level of recursion depends on the longest path in the automaton.
So a large automaton like above can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will always be
adversarial cases where a large automaton is created from a small input so I think we should
also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates
in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger
than 1000 would throw an IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that a repeat
operator can create and throw a TooComplex..Exception when this limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and
uses this as an optimization for PrefixQuery that uses infinite automatons only.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message