lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Bowesman <>
Subject Rebuilding parallel indexes
Date Mon, 09 Jun 2008 14:48:47 GMT
I have a design where I will be using multiple index shards to hold approx 7.5 
million documents per index per month over many years.  These will be large 
static R/O indexes but the corresponding smaller parallel index will get many 
frequent changes.

I understand from previous replies by Hoss that the technique to handle this is 
to use parallel indexes where the parallel index gets rebuilt periodically with 
the changing data.

However, this 'periodically' needs to be quite frequent to try to provide 
responsive changes to the index, potentially several times a dat.  One problem 
is that there can be updates to any of the data in almost any month, so an 
update by a user to 120 documents, one document per month for 10 years, requires 
a full rebuild of the 120 index shards of 7.5m docs each...

I was wondering what the technical reasons were why a 'delete+add' could not 
allow the original docId to be re-used, thus keeping the two parallel indexes in 
sync without requiring a rebuild.

If this could be overcome, this would make this parallel index pattern so much 
more useful for large volume data sets.

Any thoughts

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message