lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-5667) Optimize common-prefix across all terms in a field
Date Mon, 12 May 2014 11:38:15 GMT
Michael McCandless created LUCENE-5667:
------------------------------------------

             Summary: Optimize common-prefix across all terms in a field
                 Key: LUCENE-5667
                 URL: https://issues.apache.org/jira/browse/LUCENE-5667
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 4.9, 5.0


I tested different UUID sources in Lucene
http://blog.mikemccandless.com/2014/05/choosing-fast-unique-identifier-uuid.html
and I was surprised to see that Flake IDs were slower than UUID V1.
They use the same raw sources of info (timestamp, node id, sequence
counter) but Flake ID preserves total order by keeping the timestamp
"intact" in the leading 64 bits.

I think the reason might be because a Flake ID will typically have a
longish common prefix for all docs, and I think we might be able to
optimize this in block-tree by storing that common prefix outside of
the FST, or maybe just pre-computing the common prefix on init and
storing the "effective" start node for the FST.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message