lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: need for re-indexing when using managed schema
Date Mon, 16 Dec 2019 18:47:00 GMT
That’s a little overstated, a full explanation of what’s safe and what’s not is several
pages and depends on what you mean by “safe”.

Any modification to a schema, even if they don’t cause something to outright break, may
leave the index in an inconsistent state. For instance, remember that Lucene and Solr really
don’t care if doc1 doesn’t have a particular field X and doc2 does. If you do something
as “safe” as add a new field, only documents indexed after that change will have the field.
Your index will continue to function with no errors in that case, but any searches on the
new field won’t return any docs indexed before the change until the older docs are re-indexed.

So you can see where this is going. “If you add a field _and then reindex all your documents_,
it’s perfectly safe. However, between the time you add the field and the re-indexing is
complete, you results may be inconsistent.

On the other hand,  if you change, say, a DocValues field from multValued="true" to multiValued=“false”
the results are undefined _even if you reindex all your docs_.

On the other, other hand, if you delete a field, the meta-data is still in your index, the
only way to get rid of it is to delete your index and re-index or index to a new collection
and searches may return docs on the deleted field if it was created with a dynamic field definition
that’s still in the schema”.

On the other, other, other hand… the list goes on and on.

So since even something as non-breaking as adding a new field requires you to re-index all
your older docs anyway to get back to a consistent state, so it’s just easiest to plan on
re-indexing all your docs whenever you change the schema. And, I’d also advise, index to
a new collection…

Best,
Erick

> On Dec 16, 2019, at 12:57 PM, Joseph Lorenzini <jaloren@gmail.com> wrote:
> 
> Hi all,
> 
> I have question about the managed schema functionality.  According to the
> docs, "All changes to a collection’s schema require reindexing". This would
> imply that if you use a managed schema and you use the schema API to update
> the schema, then doing a full re-index is necessary each time.
> 
> Is this accurate or can a full re-index be avoided?
> 
> Thanks,
> Joe


Mime
View raw message