lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] [Commented] (LUCENE-3216) Store DocValues per segment instead of per field
Date Thu, 30 Jun 2011 15:59:28 GMT


Simon Willnauer commented on LUCENE-3216:

So this means, if you use default StandardCodec, and 3 fields store
doc values, and "main" CFS is off but doc values CFS is on, you'll see
a cfs file holding the 3-6 sub-files that your docvalues created,

But eg if some fields use another codec, then that codec will have its
own CFS for any fields it has with docvalues (this is the TODO)?
That's seems fine for starters.{quote}

again correct. So what I have in mind is a "global" cfs that a codec can pull via PerDocWriteState
or something that holds all of them but for now having this per codec is fine IMO. I will
create a follow up for this.

bq. For the nested test... couldn't you createCompoundOutput directly from an opened CompoundFileDirectory?
(Vs creating externally & copying in).
Yes I could but this functionality is tricky and not needed currently so I left it out for

{quote}I like CodecConfig, but I'm not sure it should hold things specific
only to 1 codec, like the Pulsing cutoff? The other settings seem
more widely applicable... though I guess even terms cache size is not
used by various codecs, but it is by enough to have it in
CodecConfig, I think?{quote}

I am not sure here, I had the same thought but when you look at Solr and other high level
users they need to configure stuff somehow so I put all reasonable core stuff in there. I
think its ok to have this for only one codec. Thoughts?

> Store DocValues per segment instead of per field
> ------------------------------------------------
>                 Key: LUCENE-3216
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>             Fix For: 4.0
>         Attachments: LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216.patch,
LUCENE-3216.patch, LUCENE-3216.patch, LUCENE-3216_floats.patch
> currently we are storing docvalues per field which results in at least one file per field
that uses docvalues (or at most two per field per segment depending on the impl.). Yet, we
should try to by default pack docvalues into a single file if possible. To enable this we
need to hold all docvalues in memory during indexing and write them to disk once we flush
a segment. 

This message is automatically generated by JIRA.
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message