lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dawid Weiss <dawid.we...@gmail.com>
Subject Re: Levenshtein FST's?
Date Sun, 29 May 2016 08:26:57 GMT
> I think I see this now, and how skipping determinization and matching with
> the NFA could easily leave you with an intractable amount of backtracking
> for even the simpler binary question of does my input match any of the
> automatons I've unioned.

Note that with NFAs you may answer the question of whether the input
matches without completing all the partial NFA matching states. So it
may be a viable solution, actually. Much depends on your particular
problem.

> Ok, yea I could see these being the same ones whose union is not too large
> to determinize.

The difference is that you can create a minimal deterministic
automaton for a set of (sorted) input strings in O(n) -- it is really,
really fast and memory friendly. You can also fairly trivially
enumerate (and store) the set of unique strings accepted by automata.
So this is, again, something to simply try in real life -- could be a
way to build a minimal automaton out of a large number of smaller
automata.

> re2 does look neat for matching regexes which can be expressed as DFAs.

I actually wanted to point you at the papers published by Russ Cox,
not the implementation itself. See:

https://swtch.com/~rsc/regexp/

> I think I see now how determinizing the union of (Levenshtein) DFAs likely
> has no "algorithmic shortcut" I'll be able to devise :)

My reply #3 was only partially a joke. There are no naive questions
and the problem may have a fast algorithmic solution (I don't know of
one though). If this is of interest to you, and it's definitely a nice
research topic, I'd contact somebody in the academia who is rooted
deeper in automata theory.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message