lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <>
Subject Re: document field updates
Date Thu, 01 Mar 2007 10:47:21 GMT

On Feb 28, 2007, at 8:59 AM, Steven Parkes wrote:

> 	Are unindexed fields stored seperately from the main inverted
> index?
> 	If so then, one could implement the field value change as a
> delete and
> 	re-add of just that value?
> The short answer is that won't work. Field values are stored in a
> different data structure than the postings lists but docids are
> consistent across all contents of a segment. Deleting something and
> readding it is going to put it into a different segment which is going
> to keep this from working. (Not to mention that you want the postings
> lists updated if you want it to be searchable ...)
> 	Are you aware of some implementation of Lucene that solves this
> need
> 	well with a second index for 'tags' complete with multi-index
> boolean
> 	queries?
> I'm pretty sure this has been done, I'm just not 100% sure where. Does
> Nutch index link text?

Nutch does do this sort of thing, but I'm not quite sure how.  It  
isn't doing any operations to the Lucene index beyond what plain ol'  
Lucene does.

> I don't know if Solr has anything like this but
> if I remember correctly, Collex has tags but as far as I can tell,  
> it's
> not been open sourced (yet?)

Collex is quite open source, its just ugly source :)  We're the  
'patacriticism' project at SourceForge, under the "collex" directory  
in Subversion.

Collex implements tagging by implementing JOIN cross-references  
between user/tag documents and regular object documents.  It's  
scalability is not going to be good at bigger numbers in its current  
architecture, but it works quite well for our 60k or so objects at  
the moment.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message