lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Smiley <>
Subject Re: Improve performance of FST Arc traversal
Date Fri, 26 Apr 2019 14:30:42 GMT
Maybe what you propose would allow FST.cachedRootArcs to be eliminated?
Basically your algorithm could decide that the root would always be direct

~ David Smiley
Apache Lucene/Solr Search Developer

On Thu, Apr 25, 2019 at 5:18 PM Dawid Weiss <> wrote:

> > I'm curious how the hot cache you describe would be maintained and
> > accessed.
> The only gain I was successful at was with static precomputed cache.
> Calculated from a priori data (hot paths) or, in the extreme, the
> entire FST is converted to a hash map. :) This allows very fast
> lookups of any node-arc pair, but is not usable for enumerating
> outgoing arcs (for example)... It's really something we did to
> accelerate millions and millions of lookups over the FST (in an
> otherwise not even Lucene-related data structure). I don't think it'll
> be generally useful.
> > I know eg that the current implementation has a cache of the
> > first 128 "root" arcs, which I guess are presumed to be
> > highly-visited, but I think what you are describing is a more dynamic
> > cache based on usage?
> I don't think it's an assumption of high-visited ratio... it's just a
> limit so that we don't cache very extreme root node fan-outs. The
> choice of this limit is very arbitrary I'd say...
> > Were you thinking that one would maintain an LRU
> > cache say? Or was this some offline analysis you did based on access
> > patterns?
> On patterns. So not generally useful.
> Don't get me wrong -- try to experiment with that constant-expanded
> array... This can be useful especially for nodes close to the root (if
> they're dense)... So definitely worth looking into.
> Dawid
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message