lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gus Heck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-13131) Category Routed Aliases
Date Thu, 10 Jan 2019 19:58:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739731#comment-16739731
] 

Gus Heck commented on SOLR-13131:
---------------------------------

h1. Functionality
h2. New Parameter Value

*router.name* would gain a new valid value of "category"
h2. New Params

This feature would need some safety valves on it to avoid collection creation (similar in
spirit to router.maxFutureMs for TRAs). To that end I suggest:
 # *router.maxCardinality* to place a limit on the total number of collections that can be
created (maybe required?)
 # *router.mustMatch* to provide pattern matching for valid data and reject requests that
would create an undesired collection (optional)
 # {color:#707070}*router.dictionary*{color}  might also be added to provide a set of acceptable
values (optional) - This may or may not be implemented as part of this ticket.

With respect to router.dictionary, I could imagine there being a desire to have that dictionary
used as a spell checker for segments of the values. One could break the value on _ (or something
else) and make sure all the parts are spelled properly. One could also imagine the dictionary
being applied to specific matching groups from router.mustMatch, but all of this dictionary
based checking would be a future enhancement. I'm mentioning it here to get the idea out there
for future reference.
h2. Routed Field Constraints

The data in the field to be routed will need to be constrained in a couple ways to make this
work
 # The routed field would need to be single valued, and encountering multiple values should
throw an error.
 # The value in the routed field must be convertible to a valid collection name. This conversion
will likely be done by replacing any invalid characters with '_' and it is the user's responsibility
to ensure that the resulting names are unique and do not interfere with other collections
in the system. Values that resolve to an existing collection that is not part of the alias
will cause an error to be returned, the existing collection will remain unaffected and will
not become added to the alias.

h2. Validations

In addition to constraints on the values, the following validations will be enforced at the
time the CategoryRoutedAlias is created
 # The *collections* attribute is not set (applies only to non-routed aliases)
 # None of the TimeRoutedAlias attributes are present
 # TimeRoutedAliases will also be modified to validate that *router.maxCardinality* and *router.mustMatch*
are not set

h1. Implementation

The intention here is to first convert TimeRoutedAliasUpdateProcessor to RoutedAliasUpdateProcessor
and move as much time related functionality to TimeRoutedAlias class as possible. If necessary
TimeRoutedAliasUpdateProcessor may still remain as a (hopefully skinny) subclass of RoutedUpdateProcessor. I
also hope to extract a RoutedAlias interface from TimeRoutedAlias and that will implemented
on a new CategoryRoutedAlias class. Ideally I'd like to end up with a RoutedAliasUpdateProcessor
and two concrete RoutedAlias implementations, though I'm not sure if that will really be possible.
I'll break things down and make individual tickets for sub parts after I play with the code
a little.

Both v1 api and v2 api will be supported
h1. Documentation
 # The TimeRoutedAliases page will be converted to a RoutedAliases page with sections for
TimeRoutedAliases and CategoryRoutedAliases
 # The CreateAliasCommand Documentation will be updated
 # The v2 api will return documentation for the new and modified attributes via that api.

 

> Category Routed Aliases
> -----------------------
>
>                 Key: SOLR-13131
>                 URL: https://issues.apache.org/jira/browse/SOLR-13131
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
>
> This ticket is to add a second type of routed alias in addition to the current time routed
aliases. The new type of alias will allow data driven creation of collections based on the
values of a field and automated organization of these collections under an alias that allows
the collections to also be searched as a whole.
> The use case in mind at present is an IOT device type segregation, but I could also see
this leading to the ability to direct updates to tenant specific hardware (in cooperation
with autoscaling). 
> This ticket also looks forward to (but does not include) the creation of a Dimensionally
Routed Alias which would allow organizing time routed data also segregated by device
> Further design details to be added in comments.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message