lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Date Mon, 19 Feb 2007 19:32:06 GMT
FWIW, we support, in our in-house system and in addition to fixed  
field semantics,  completely dynamic field names for some  
applications, but they have a fixed field type.  So, the field name  
can be anything, but the attributes of the field are fixed (i.e. it  
will always be tokenized with norms). This is useful for us, in some  
cases, when indexing XML files where the tag name becomes the field  
name and the set of tag names are not known ahead of time.  I suppose  
there are ways around this (by preprocessing all the files), but  
having the ability to add arbitrary fields is a good thing for us and  
some of the applications we do.


On Feb 19, 2007, at 2:14 AM, Marvin Humphrey wrote:

> On Feb 18, 2007, at 10:33 PM, Chris Hostetter wrote:
>> I don't suppose you have a mailing pointer to my old comments do you
>> Marvin?
> (
> You're in good company.  The other party with strong objections was  
> Doug.
>> en if i wanted to be able to use
>> an option on field foo for some docs and not on others i'd have to  
>> have
>> foo_optOFF and foo_optON ... then anytime i wanted to search on  
>> "foo" i'd
>> have to use a booleanquery without a coord factor across both.
> I'm trying to think of an example where that would actually come  
> into play.  What are some of the options you'd turn on and off?   
> Norms?  Tokenization?
> IIUC, that's a third objection in addition to your two from the  
> previous discussion.  The other two were "evolving" indexes, and  
> what you described as "dynamically typed fields" but what I would  
> call "multi-dimensional data".
> KinoSearch's Schema scheme allows you to add new fields, but you  
> can't take any away or change any existing defs  -- so you're able  
> to evolve, but only within fairly tight constraints.  I can't think  
> of an elegant way to improve that situation, so I've declared that  
> aspect "good enough" and we're moving on.  I don't think it's  
> really any worse than what we have now -- where field defs persist,  
> stored in field infos files, etc, and the resolution of conflicts  
> can bite you in the butt (as I recall hearing you discuss when no- 
> norms was being hashed out wrt suddenly having lots and lots of  
> memory-sucking norms).
> The multi-dimensional data problem is the one I'm most interested  
> in solving.  Lucene/KS indexes don't handle one-to-many  
> relationships well.  In your example, you had an index where the  
> products had "attributes", and the attributes might have taken many  
> different names -- so it wasn't possible to know all the field  
> names in advance.  It sounds like, effectively, you were faking a  
> second table.  So long as the names of your attributes don't crash  
> into the names of your primary fields, that'll work.
> At present, KS doesn't let you do something like that -- you have  
> to define all your fields up front.  What I'd like to do is come up  
> with a FieldDef subclass that handles multi-dimensional data.  I  
> seem to recall that Solr had something along those lines, using  
> prefixed field names or something.  Do I recall correctly?
> Marvin Humphrey
> Rectangular Research
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Grant Ingersoll

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message