lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki>
Subject Re: Rebuilding parallel indexes
Date Mon, 09 Jun 2008 19:35:26 GMT
Antony Bowesman wrote:
> I have a design where I will be using multiple index shards to hold 
> approx 7.5 million documents per index per month over many years.  These 
> will be large static R/O indexes but the corresponding smaller parallel 
> index will get many frequent changes.
> I understand from previous replies by Hoss that the technique to handle 
> this is to use parallel indexes where the parallel index gets rebuilt 
> periodically with the changing data.
> However, this 'periodically' needs to be quite frequent to try to 
> provide responsive changes to the index, potentially several times a 
> dat.  One problem is that there can be updates to any of the data in 
> almost any month, so an update by a user to 120 documents, one document 
> per month for 10 years, requires a full rebuild of the 120 index shards 
> of 7.5m docs each...
> I was wondering what the technical reasons were why a 'delete+add' could 
> not allow the original docId to be re-used, thus keeping the two 
> parallel indexes in sync without requiring a rebuild.
> If this could be overcome, this would make this parallel index pattern 
> so much more useful for large volume data sets.
> Any thoughts

I have a thought ;) Perhaps you could use a FilteredIndexReader to 
maintain a map between new IDs and old IDs, and remap on the fly. 
Although I think that some parts of Lucene depend on the fact that in a 
normal index the IDs are monotonically increasing ... this would 
complicate the issue.

Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration  Contact: info at sigram dot com

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message