lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Walter Underwood <>
Subject Re: How to update SOLR schema from continuous integration environment
Date Sat, 01 Nov 2014 16:24:39 GMT
Nice pictures, but that preso does not even begin to answer the question.

With master/slave replication, I do schema migration in two ways, depending on whether a field
is added or removed.

Adding a field:

1. Update the schema on the slaves. A defined field with no data is not a problem.
2. Update the master.
3. Reindex to populate the field and wait for replication.
4. Update the request handlers or clients to use the new field.

Removing a field is the opposite. I haven’t tried lately, but Solr used to have problems
with a field that was in the index but not in the schema.

1. Update the request handlers and clients to stop using the field.
2. Reindex without any data for the field that will be removed, wait for replication.
3. Update the schema on the master and slaves.

I have not tried to automate this for continuous deployment. It isn’t a big deal for a single
server test environment. It is the prod deployment that is tricky.

Walter Underwood

On Nov 1, 2014, at 7:29 AM, Will Martin <> wrote:

> -----Original Message-----
> From: Jack Krupansky [] 
> Sent: Saturday, November 01, 2014 9:46 AM
> To:
> Subject: Re: How to update SOLR schema from continuous integration environment
> In all honesty, incrementally updating resources of a production server is a rather frightening
proposition. Parallel testing is always a better way to go - bring up any changes in a parallel
system for testing and then do an atomic "swap" - redirection of requests from the old server
to the new server and then retire the old server only after the new server has had enough
time to burn in and get past any infant mortality problems.
> That's production. Testing and dev? Who needs the hassle; just tear the old server down
and bring up the new server from scratch with all resources updated from the get-go.
> Oh, and the starting point would be keeping your full set of config and resource files
under source control so that you can carefully review changes before they are "pushed", can
compare different revisions, and can easily back out a revision with confidence rather than
"winging it."
> That said, a lot of production systems these days are not designed for parallel operation
and swapping out parallel systems, especially for cloud and cluster systems. In these cases
the reality is more of a "rolling update", where one node at a time is taken down, updated,
brought up, tested, brought back into production, tested some more, and only after enough
burn in time do you move to the next node.
> This rolling update may also force you to sequence or stage your changes so that old
and new nodes are at least relatively compatible. So, the first stage would update all nodes,
one at a time, to the intermediate compatible change, and only when that rolling update of
all nodes is complete would you move up to the next stage of the update to replace the intermediate
update with the final update. And maybe more than one intermediate stage is required for more
complex updates.
> Some changes might involve upgrading Java jars as well, in a way that might cause nodes
give incompatible results, in which case you may need to stage or sequence your Java changes
as well, so that you don't make the final code change until you have verified that all nodes
have compatible intermediate code that is compatible with both old nodes and new nodes.
> Of course, it all depends on the nature of the update. For example, adding more synonyms
may or may not be harmless with respect to whether existing index data becomes invalidated
and each node needs to be completely reindexed, or if query-time synonyms are incompatible
with index-time synonyms. Ditto for just about any analysis chain changes - they may be harmless,
they may require full reindexing, they may simply not work for new data (i.e., a synonym is
added in response to late-breaking news or an addition to a taxonomy) until nodes are updated,
or maybe some queries become slightly or somewhat inaccurate until the update/reindex is complete.
> So, you might want to have two stages of test system - one to just do a raw functional
test of the changes, like whether your new synonyms work as expected or not, and then the
pre-production stage which would be updated using exactly the same process as the production
system, such as a rolling update or staged rolling update as required. The closer that pre-production
system is run to the actual production, the greater the odds that you can have confidence
that the update won't compromise the production system.
> The pre-production test system might have, say, 10% of the production data and by only
10% the size of the production system.
> In short, for smaller clusters having parallel systems with an atomic swap/redirection
is probably simplest, while for larger clusters an incremental rolling update with thorough
testing on a pre-production test cluster is the way to go.
> -- Jack Krupansky
> -----Original Message-----
> From: Faisal Mansoor
> Sent: Saturday, November 1, 2014 12:10 AM
> To:
> Subject: How to update SOLR schema from continuous integration environment
> Hi,
> How do people usually update Solr configuration files from continuous integration environment
like TeamCity or Jenkins.
> We have multiple development and testing environments and use WebDeploy and AwsDeploy
type of tools to remotely deploy code multiple times a day, to update solr I wrote a simple
node server which accepts conf folder over http, updates the specified conf core folder and
restarts the solr service.
> Does there exists a standard tool for this uses case. I know about schema rest api, but,
I want to update all the files in the conf folder rather than just updating a single file
or adding or removing synonyms piecemeal.
> Here is the link for the node server I mentioned if anyone is interested.
> Thanks,
> Faisal 

View raw message