lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] [Created] (LUCENE-7905) Optimizations for OrdinalMap
Date Wed, 12 Jul 2017 14:59:00 GMT
Michael McCandless created LUCENE-7905:
------------------------------------------

             Summary: Optimizations for OrdinalMap
                 Key: LUCENE-7905
                 URL: https://issues.apache.org/jira/browse/LUCENE-7905
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Michael McCandless
            Assignee: Michael McCandless
             Fix For: 7.1
         Attachments: LUCENE-7905.patch

{{OrdinalMap}} is a useful class to quickly map per-segment ordinals to global space, but
it's fairly costly to build, which must typically be done on every NRT refresh.

I'm using it quite heavily in two different places, one for {{SortedSetDocValuesFacetCounts}},
and another custom usage, and I found some small optimizations to improve its construction
time.

I switched it to use a simple priority queue to merge the terms instead of the more general
{{MultiTermsEnum}}, which does extra work since it must also provide postings, implement seekExact,
etc.

I also pulled {{OrdinalMap}} out into its own oal.index class.

When testing construction time for my case the patch is ~16% faster (159.9s -> 134.2s)
in one case with 91.4 M terms and ~9% faster (115.6s -> 105.7s) in another case with 26.6
M terms.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message