lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexandre Rafalovitch <>
Subject Re: Feature: Solr implicitly defined field types?
Date Fri, 04 Jan 2019 16:52:25 GMT
What about if a system schema was loaded at a startup implicitly.
Then, if a new schema is loaded and type definition is missing, it is
copied - at that time - into the specific schema. So, on the first
rewrite those - and only those used - types will be written out.

This allows to version the system types the same way as we version
normal schema. I agree with Gus that hidden configuration causes all
sorts of challenges.

And - for tooling purposes - there definitely needs to be a way to get
all used definitions, explicit and implicit, used and just available.
That also points towards something that already has self-describing
mechanism (like Schema API) available.


On Fri, 4 Jan 2019 at 10:45, David Smiley <> wrote:
> I'm thinking this feature would be used conservatively -- and thus just primitive types
that wouldn't have an interesting configuration to them, or for something you are really not
expected to change (the nest path of nested docs).  So you wouldn't feel you had to go read
the docs.  The schema might even have a comment to mention a list of implicit field types
(a one-liner comma delimited list).
> On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <> wrote:
>> I'm perhaps slightly conservative with respect to configuration, but I'm not fond
of hidden configuration that I can't see. What I don't like is looking at a config file and
not seeing the full story. That means i have to read the config and ALSO go read some part
of the documentation that I've failed to memorize, and probably need to google to find to
be fully aware of what's going on....  (and no I didn't like it when some standard stuff disappeared
from solrconfig.xml a while back either). Small changes of course seem reasonable, but the
further we drift into implicit things, especially if we get a collection of several implicit
things described in various disparate parts of the manual the more cryptic the system becomes.
That's my opinion, YMMV.
>> -Gus
>> On Thu, Jan 3, 2019 at 2:57 PM David Smiley <> wrote:
>>> Broadly, you refer to "locale" issues.  Solr's way of dealing with this today
is with optional & configurable use of URPs.  The schema-less / data-driven mode has some
of these enabled; you can see it in the solrconfig.xml including many date formats.  You can
look into that for further info if you like.  The primitive field types are not locale sensitive.
>>> Update: It's looking like 8.0 will only employ this implicit field type mechanism
for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll
only be documented in the context of this particular feature.  It'd be nice to see the scope
of fields expanded and at that juncture it could/should be more broadly documented.  That
can wait to people have energy to do it.
>>> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <> wrote:
>>>> Hi David,
>>>> I now get the idea and yes this makes sense. It would require though some
tutorial or best practices, eg overriding a platform data type may make not so much sense
- it may confuse new developers in an existing project that know Solr, but then get a platform
type that has not the default behavior.
>>>> Could you deal with different languages in platform types? Eg for dates it
does not seem a problem, because Solr expects only one specific type of date that needs to
be somehow converted beforehand (maybe that conversion could be also part of a platform type),
but decimals are different in some languages or Boolean values.
>>>> Am 30.12.2018 um 07:01 schrieb David Smiley <>:
>>>> Thanks for your thoughtful response Jörn!
>>>> ...
>>>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <>
>>>>> I think it is a good idea, but I see some potential complexity for “deployment”
of collections. For instance, in environments where Solr is used as a shared platform amongst
several stakeholders, every time you deploy/modify a collection you need to take care that
the platform types exist. If it exists in the Test environment then i need to make sure that
it exists as well in acceptance/production. The problem is that the platform type could have
been defined by somebody else who has not yet (eg due to project/sprint delays) not updated
the other environments. Another issue is if I move to another Solr cluster in the same environment.
Then, I have to make sure that all platform types move with me.
>>>> RE "the platform type could have been defined by somebody else":  I'm not
imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.
>>>> Otherwise, I think I get your point, but perhaps I don't.  It's the same
point for any use of some new feature of Solr.  If you use some new feature, you have to take
care that all Solr instances you deploy your configuration to can handle that new feature.
 That's a fairly generic point that would apply to just about anything in Solr.
>>>>> A (minor) issue is that platform types may change (for whatever reasons)
and that then potentially all collections have to be reindexed or we have different versions
of the same platform type making things not easier.
>>>> Yes it's possible.  Though I think that point is apart from the feature I
propose.  You're saying that you might want to use an "int" field and then one day realize
you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.
 That's true wether the field type is explicit or implicit.  There's nothing stopping you
from explicitly defining the field type if you want to; the names would not be reserved. If
you want to stick with your current index running the new Solr version, then you would keep
luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit
field types.
>>>>> Currently we have all our Schema definitions in a version management
system (we use the Schema API but the JSON requests are out there) so that projects can inspire
from each other. Needless to say, that careful type engineering requires also some documentation
on technical design and may be indeed very Collection specific.
>>>>> Another issue could be that a platform type may also imply a certain
platform solrconfig.xml (eg lib directive etc).
>>>> I'm imagining platform types would be basic primitive types (int, boolean,
etc. and some special situations like in the issue I referenced).  They would not depend on
contrib libs... though I could imagine one day an evolution of this in which a contrib could
somehow auto-add implicit field types.
>>>>> I am not sure yet what are the exact benefits of referring to types of
other collections in the Solr runtime itself instead of having a version system and letting
projects decide if they want to adapt types of other collections, but maybe I am overlooking
something here.
>>>> The notion of implicit field types is not a cross-config (cross-collection)
thing.  Implicit field types are nothing more than built-in shortcuts.
>>>> I recall one of my very early observations of Solr's schema was of surprise
to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to
varchar and such.  Your DDL doesn't need to define what a varchar is!
>>>> Happy New Year,
>>>> ~ David
>>>>> Am 28.12.2018 um 17:36 schrieb David Smiley <>:
>>>>> While working on it
occurred to me that it would be nice if Solr had implicitly defined field types.  This would
allow you to define a field in your schema that refers to a type that is not also in your
schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic,
or need not be passed to schema manipulation API if you use that).  The idea would be that
these types would be Solr platform provided field types that need not be defined by you.
>>>>> There are multiple ways this loose idea might be conceived / imagined
into a concrete proposal.
>>>>> (A) The main idea I'm kicking around right now is that Solr would _not_
throw an error at the moment of reading your field definition that it doesn't see your type...
instead it would see it's a platform type (via some built-in hard-coded registry) and then
register that type on the fly.  So if you were to read the schema then you'd see it.  In this
way, it's kind of a shortcut.  Platform field types that you don't actually refer to will
never end up being put into your schema.
>>>>> (B) A schema could pre-initialize with the platform/implicit types. 
This is the simplest idea but I don't like it because you may not even need some of these
types.  I'm not going to go down this path now but wanted to mention it.
>>>>> I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"
 field in support of nested documents in 8.0, but conceivably the idea would be expanded to
lots of things in our base schema right now (int, str, etc.)
>>>>> --
>>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>>> LinkedIn: | Book:
>>>> --
>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>> LinkedIn: | Book:
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: | Book:
>> --
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: | Book:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message