lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Juarez Sampaio <jua...@simbioseventures.com>
Subject Automata and Transducer on Lucene 6
Date Tue, 18 Apr 2017 13:58:24 GMT
Hello everyone,

Recently I've watched a few videos and read a few blog posts on Lucene's
Automata and how one can speed up things by 100x when properly using
Automata and Transducers. "I can definitely use a boost like this", right?
The problem is that this material I've read was writen to Lucene 4 and it
seems the API has changes a lot since then.

To beggin with, *I can't find transducers* anywhere and I'm missing a few
Automata construction capabilities such as union (it used to be located on
the class BasicOperations). I think what I am really missing is an intro to
Automata classes on Lucene 6. *Can someone point me to a link introducing
Automata (and possibly Transducers) on Lucene 6?*

So far I've been learning by navigating java docs with ctrl + F, which
hasn't been productive: It took me a while to figure out I had to use a
AutomatonRun to check that the automaton accepts a given char sequence. And
before that I had tried to manually start from node 0 and manually traverse
the Automata and check for a final state at the end of a String. I'd really
appreciate some guidance here.

I'd like to read something written by who designed these classes. What
motivated, usage examples, what it is good for and what it is not good for.
Maybe a history of the development of Automata on Lucene. Where they built
for in-memory usage only? Is there a good way to go about serializing it?
If possible, I'd like some explanation on the mad pointers structure used
to efficiently implement automata. From the videos I watched I was
expecting a byte[] implementation, but looking at the code I see a couple
of int[] used to represent states and transitions. What happened to the
byte[] implementation of Lucene 4?
-- 
Juarez

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message