lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Ferenczi (JIRA)" <>
Subject [jira] [Created] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
Date Tue, 01 Aug 2017 10:52:00 GMT
Jim Ferenczi created LUCENE-7914:

             Summary: Add safeguards to RegExp.toAutomaton and Operations
                 Key: LUCENE-7914
             Project: Lucene - Core
          Issue Type: Bug
            Reporter: Jim Ferenczi

When creating an automaton from a regexp, some operators can create more states than maxDeterminizedStates:


The example above <b>creates</b> a single path with 10k states already determinized
so maxDeterminizedStates is not checked. 
Some operations on automaton like Operations.isFinite or Operations.topoSortStates are recursive
and the maximum level of recursion depends on the longest path in the automaton. So a large
automaton like above can exceed java's stack.
In most of the cases we are covered by maxDeterminizedStates but there will always be adversarial
cases where a large automaton is created from a small input so I think we should also have
safeguards in the recursive methods. 

I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates
in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger
than 1000 would throw an IllegalStateException.
The patch also uses maxDeterminizedStates to limit the number of states that a repeat operator
can create and throw a TooComplex..Exception when this limit is reached.
Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and uses
this as an optimization for PrefixQuery that uses finite automatons only.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message