lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Han Jiang (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
Date Mon, 15 Jul 2013 14:16:49 GMT


Han Jiang commented on LUCENE-3069:

bq. I think we should assert that the seekCeil returned SeekStatus.FOUND?

Ok! I'll commit that.

bq. useCache is an ancient option from back when we had a terms dict cache

Yes, I suppose is is not 'clear' to have this parameter.

bq. seekExact is working as it should I think.

Currently, I think those 'seek' methods are supposed to change the enum pointer based on
input term string, and fetch related metadata from term dict. 

However, seekExact(BytesRef, TermsState) simply 'copy' the value of termState to enum, which

doesn't actually operate 'seek' on dictionary. 

bq. Maybe instead of term and meta members, we could just hold the current pair?

Oh, yes, I once thought about this, but not sure: like, can the callee always makes sure that,
when 'term()' is called, it will always return a valid term?
The codes in MemoryPF just return 'pair.output' regardless whether pair==null, is it safe?

bq. TempMetaData.hashCode() doesn't mix in docFreq/tTF?

Oops! thanks, nice catch!

> Lucene should have an entirely memory resident term dictionary
> --------------------------------------------------------------
>                 Key: LUCENE-3069
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Simon Willnauer
>            Assignee: Han Jiang
>              Labels: gsoc2013
>             Fix For: 4.4
>         Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch
> FST based TermDictionary has been a great improvement yet it still uses a delta codec
file for scanning to terms. Some environments have enough memory available to keep the entire
FST based term dict in memory. We should add a TermDictionary implementation that encodes
all needed information for each term into the FST (custom fst.Output) and builds a FST from
the entire term not just the delta.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message