lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Commented] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility
Date Mon, 01 Apr 2013 11:35:15 GMT


Robert Muir commented on SOLR-4658:

And maybe its not that the whole class need be abstract, just the implicit factory thats currently
done with static methods.

So it could have load/save or something simple like that. The default one today wouldnt have
any options at all and would throw UOE on save().

> In preparation for dynamic schema modification via REST API, add a "managed" schema facility
> --------------------------------------------------------------------------------------------
>                 Key: SOLR-4658
>                 URL:
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 4.3
>         Attachments: SOLR-4658.patch
> The idea is to have a set of configuration items in {{solrconfig.xml}}:
> {code:xml}
> <schema managed="true" mutable="true" managedSchemaResourceName="managed-schema"/>
> {code} 
> It will be a precondition for future dynamic schema modification APIs that {{mutable="true"}}.
 {{solrconfig.xml}} parsing will fail if {{mutable="true"}} but {{managed="false"}}.
> When {{managed="true"}}, and the resource named in {{managedSchemaResourceName}} doesn't
exist, Solr will automatically upgrade the schema to "managed": the non-managed schema resource
(typically {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} under
{{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at {{/configs/$configName/}}, and the
non-managed schema resource is renamed by appending {{.bak}}, e.g. {{schema.xml.bak}}.
> Once the upgrade has taken place, users can get the full schema from the {{/schema?wt=schema.xml}}
REST API, and can use this as the basis for modifications which can then be used to manually
downgrade back to non-managed schema: put the {{schema.xml}} in place, then add {{<schema
managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}} element,
since {{managed="false"}} is the default).
> If users take no action, then Solr behaves the same as always: the example {{solrconfig.xml}}
will include {{<schema managed="false" ...>}}.
> For a discussion of rationale for this feature, see []'s post
to the solr-user mailing list in the thread "Dynamic schema design: feedback requested" []:
> {quote}
> Ignoring for a moment what format is used to persist schema information, I 
> think it's important to have a conceptual distinction between "data" that 
> is managed by applications and manipulated by a REST API, and "config" 
> that is managed by the user and loaded by solr on init -- or via an 
> explicit "reload config" REST API.
> Past experience with how users percieve(d) solr.xml has heavily reinforced 
> this opinion: on one hand, it's a place users must specify some config 
> information -- so people wnat to be able to keep it in version control 
> with other config files.  On the other hand it's a "live" data file that 
> is rewritten by solr when cores are added.  (God help you if you want do a 
> rolling deploy a new version of solr.xml where you've edited some of the 
> config values while simultenously clients are creating new SolrCores)
> As we move forward towards having REST APIs that treat schema information 
> as "data" that can be manipulated, I anticipate the same types of 
> confusion, missunderstanding, and grumblings if we try to use the same 
> pattern of treating the existing schema.xml (or some new schema.json) as a 
> hybrid configs & data file.  "Edit it by hand if you want, the /schema/* 
> REST API will too!"  ... Even assuming we don't make any of the same 
> technical mistakes that have caused problems with solr.xml round tripping 
> in hte past (ie: losing comments, reading new config options that we 
> forget to write back out, etc...) i'm fairly certain there is still going 
> to be a lot of things that will loook weird and confusing to people.
> (XML may bave been designed to be both "human readable & writable" and 
> "machine readable & writable", but practically speaking it's hard have a 
> single XML file be "machine and human readable & writable")
> I think it would make a lot of sense -- not just in terms of 
> implementation but also for end user clarity -- to have some simple, 
> straightforward to understand caveats about maintaining schema 
> information...
> 1) If you want to keep schema information in an authoritative config file 
> that you can manually edit, then the /schema REST API will be read only. 
> 2) If you wish to use the /schema REST API for read and write operations, 
> then schema information will be persisted under the covers in a data store 
> whose format is an implementation detail just like the index file format.
> 3) If you are using a schema config file and you wish to switch to using 
> the /schema REST API for managing schema information, there is a 
> tool/command/API you can run to so.
> 4) if you are using the /schema REST API for managing schema information, 
> and you wish to switch to using a schema config file, there is a 
> tool/command/API you can run to export the schema info if a config file 
> format.
> ...wether of not the "under the covers in a data store" used by the REST 
> API is JSON, or some binary data, or an XML file just schema.xml w/o 
> whitespace/comments should be an implementation detail.  Likewise is the 
> question of wether some new config file formats are added -- it shouldn't 
> matter.
> If it's config it's config and the user owns it.
> If it's data it's data and the system owns it.
> : is the risk they take if they want to manually edit it - it's no 
> : different than today when you edit the file and do a Core reload or 
> : something. I think we can improve some validation stuff around that, but 
> : it doesn't seem like a show stopper to me.
> The new risk is multiple "actors" (both the user, and Solr) editing the 
> file concurrently, and info that might be lost due to Solr reading the 
> file, manpulating internal state, and then writing the file back out.  
> Eg: User hand edits may be lost if they happen on disk during Solr's 
> internal manpulation of data.  API edits may be reflected in the internal 
> state, but lost if the User writes the file directly and then does a core 
> reload, etc....
> : At a minimum, I think the user should be able to start with a hand 
> : modified file. Many people *heavily* modify the example schema to fit 
> : their use case. If you have to start doing that by making 50 rest API 
> : calls, that's pretty rough. Once you get your schema nice and happy, you 
> : might script out those rest calls, but initially, it's much 
> : faster/easier to whack the schema into place in a text editor IMO.
> I don't think there is any disagreement about that.  The ability to say 
> "my schema is a config file and i own it" should always exist (remove 
> it over my dead body) 
> The question is what trade offs to expect/require for people who would 
> rather use an API to manipulate these things -- i don't think it's 
> unreasable to say "if you would like to manipulate the schema using an 
> API, then you give up the ability to manipulate it as a config file on 
> disk"
> ("if you want the /schema API to drive your car, you have to take your 
> foot of hte pedals and let go of the steering wheel")
> {quote}

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message