lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Neal Richter" <>
Subject Re: document field updates
Date Wed, 28 Feb 2007 06:14:14 GMT
On 2/27/07, Steven Parkes <> wrote:
> It is true that you can store more data and that will make it possible
> to get it back. Storing fields (w/ or w/o indexing) allows you to pull
> them back. Storing term vectors gives you something in-between nothing
> and everything.

I will look into term-vectors...

> However, you're still gonna get stuck on the "update" part. Lucene does
> not rewrite segments. It's fundamental to Lucene that it doesn't: from
> that a lot of Lucene's concurrent nature flow.

Are unindexed fields stored seperately from the main inverted index?
If so then, one could implement the field value change as a delete and
re-add of just that value?

> Could you tell a little more about why delete/reinsert is not viable for
> you? A lot of people have dealt with this issue and come up with
> acceptable solutions ...

Anthony Arnone (first poster in thread) and I have been using CLucene
to change the guts of HtDig.  Here is a scenario:

We have a new term relevant to an already written document that is
newly found and we'd like to make this new term searchable.

We spider URL "foo" and index all text associated with "foo" and the
doc goes into the index.  Next we discover a link back to foo in some
other page:  <a aref="foo">beer<a/>.  What would be nice is to
incrementally add the word 'beer' to the document 'foo' (in a special
linked-text field), and all other to-be-discovered link-text without
rewriting the document (which we no longer have in memory) or
requiring a multi-index search where the linked-text field is stored
in a second index.

Note that the practical effect of the example above is much like the
adding of some new 'tag' to a document in the new world of tags.

Are you aware of some implementation of Lucene that solves this need
well with a second index for 'tags' complete with multi-index boolean

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message