lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Poppe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-7921) More efficient way to transform a RegExp to an Automaton
Date Wed, 09 Aug 2017 07:04:00 GMT
Thomas Poppe created LUCENE-7921:
------------------------------------

             Summary: More efficient way to transform a RegExp to an Automaton
                 Key: LUCENE-7921
                 URL: https://issues.apache.org/jira/browse/LUCENE-7921
             Project: Lucene - Core
          Issue Type: Improvement
    Affects Versions: 6.5.1
            Reporter: Thomas Poppe
            Priority: Minor


Consider the following example:

    public static void main(String[] args) {
        org.apache.lucene.util.automaton.RegExp regExp =
                new org.apache.lucene.util.automaton.RegExp("[a-z]{1,13}x[a-z][a-z]?[a-z]?[a-z]?[a-z]?[a-z]{0,8}");
        org.apache.lucene.util.automaton.Automaton automaton = regExp.toAutomaton();
        System.out.println("states: " + automaton.getNumStates());
        System.out.println("transitions: " + automaton.getNumTransitions());
        System.out.println("-------------------------------");

        try {
            regExp = new org.apache.lucene.util.automaton.RegExp("[a-z]{1,13}x[a-z]{1,13}");
            automaton = regExp.toAutomaton();
            System.out.println("Will not happen...");
        } catch (org.apache.lucene.util.automaton.TooComplexToDeterminizeException e) {
            automaton = regExp.toAutomaton(1_000_000);
            System.out.println("states: " + automaton.getNumStates());
            System.out.println("transitions: " + automaton.getNumTransitions());
            System.out.println("-------------------------------");
        }
    }

Both regular expressions are equivalent, but it's much more efficient to "unroll" the repetition.
 It might be possible to optimize the Regex#toAutomaton() method to handle this repetition
without going over the default number of determinized states, and using less memory and CPU?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message