lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Deleting a single TermPosition <doc, frequency, position> for a Document
Date Tue, 08 Jan 2008 05:47:05 GMT
I'd like to 'update' a single Document in a Lucene index.  In practice, this 
'update' is actually just a removal of a single TermPosition for a given Term 
for a given doc Id.

I don't think this is currently possible, but would it be easy to change Lucene 
to support this type of usage?

The reason for this is to optimise my index usage.  I'm using Lucene to index 
arbitrary data sets, however, in some data sets, each Document is indexed once 
for each user who has an interest in the document.  For example, with mail data, 
a mail item (with a single recipient) is stored as two Documents, once with the 
'user' field set to the sender's user Id and again with the user field set to 
the recipents's user Id.  Searches just filter mail for a given user by the user 

When one of those users deletes the mail, the Document with the 'user' field is 
simply deleted.  One of the original reasons for doing this was to enable 
horizontal partitioning of the index.  This works nicely, but of course the 
index is bigger than necessary and the number of terms positions is at least 
double what is necessary.

I had thought to originally indexed the data once, with the user field set to 
the sender and recipient user Id, but when the sender or recipient deletes the 
mail from their mailbox, searching becomes more complicated as the index does 
not reflect the external database state unless the mail is reindexed.

Is this something other's have wanted or are there other solutions to this problem?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message