lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Question about FieldInfos
Date Sun, 15 Jan 2006 20:51:25 GMT
: Option 1: Merge field definitions at the segment level rather than
: the Document level. The defs stay stored with individual segments,
: but everything gets moved into the .fnm file, including
: IS_COMPRESSED, IS_BINARY, etc (as I believe Robert was proposing).
: Option 2: Centralize the field definitions; allow new fields
: definitions to be added, but never allow modifications to individual
: field definitions, just to the list.  This is roughly analogous to
: UPDATE TABLE in SQL, but more limited since you can't make arbitrary
: changes.
: I like option 1 better.

I don't really understand the low level fileformat details of lucene (I
let Yonik worry abotu those things for me) and I've already forgotten the
details from earlier in this thread of what field properties on a per
field basis and which are stored on a per document basis -- but as someone
who has been burned by lucene storing a field norm for field F for all
docs as soon as one doc has a value for field F, my gut reaction is to shy
away from any proposal to move an existing document based field property

: > : Is it really necessary to be
: > : able to define new fields "at any time"?
: >
: > Absafreakinglutely.
: Down to the granularity of indexWriter.addDocument() ?  Would it work
: to open a new IndexWriter?

Well sure, it would work -- in the sense that i could allways open a new
IndexWriter for every document i wanted to add :)

I completely admit, I don't have the simplest use case -- but when I'm
indexing product data, every type of product has very different fields
from every other type of product (typically name, manufacturer, and low
price are the only fields that *every* product has). and as a steady
(never ending) steam of product additions/modifications come in, there's
no expectation that any product has the same type (or fields) as the
product before it or the product after it.  Since i do batch my updates
every N minutes, i could in fact divide up the batches into groups by type
(common fields) and update each group with one IndexWriter ... but even
then, the number of groups would be much higher then the number of
products in each group, so i'd be opening a *lot* of new IndexWriters.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message