lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (LUCENE-7914) Add safeguards to RegExp.toAutomaton and Operations
Date Mon, 07 Aug 2017 07:13:00 GMT


ASF subversion and git services commented on LUCENE-7914:

Commit 6b3ea4d0c24adad9a270e43a3b5e07aac8317bee in lucene-solr's branch refs/heads/master
from [~jimczi]
[;h=6b3ea4d ]

LUCENE-7914: AnalyzingSuggesterTest#testRandomRealisticKeys: trim big titles to make sure
that they can pass the max recursion level in Operations#topsortState.

> Add safeguards to RegExp.toAutomaton and Operations
> ---------------------------------------------------
>                 Key: LUCENE-7914
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Jim Ferenczi
>             Fix For: master (8.0), 7.1
>         Attachments: LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch, LUCENE-7914.patch,
> When creating an automaton from a regexp, some operators can create more states than
> {code}
> a{10000}
> {code}
> The example above <b>creates</b> a single path with 10k states already determinized
so maxDeterminizedStates is not checked. 
> Some operations on automaton like Operations.isFinite or Operations.topoSortStates are
recursive and the maximum level of recursion depends on the longest path in the automaton.
So a large automaton like above can exceed java's stack.
> In most of the cases we are covered by maxDeterminizedStates but there will always be
adversarial cases where a large automaton is created from a small input so I think we should
also have safeguards in the recursive methods. 
> I've attached a patch that adds a max recursion level to Operations.isFinite and Operations.topoSortStates
in order to limit stack overflows. The limit is set to 1000 so any automaton with a path bigger
than 1000 would throw an IllegalStateException.
> The patch also uses maxDeterminizedStates to limit the number of states that a repeat
operator can create and throw a TooComplex..Exception when this limit is reached.
> Finally the patch adds the ability to skip Operations.isFinite on AutomatonQuery and
uses this as an optimization for PrefixQuery that uses infinite automatons only.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message