lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Domenico Fabio Marino (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-10880) Support replica filtering by tag
Date Tue, 22 Aug 2017 15:47:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16136969#comment-16136969
] 

Domenico Fabio Marino commented on SOLR-10880:
----------------------------------------------

h2. *Current implementation:*
Replicas need to be "tagged" using replica properties:
for example the property "flavour" is set to "banana|vanilla"

The requests then need to specify what's the name of the property to be looked up (in this
case "flavour") using the parameter "replica.tag.name"
and then they need to specify that they "like" a value for that property (that is, they are
interested exclusively in the replicas that have that value)
Example:
{code:java}
replica.tag.name=flavour&replica.like=banana
{code}

Requests can otherwise specify that they want replicas that do not match a value ("dislike")
Example:

{code:java}
replica.tag.name=flavour&replica.dislike=vanilla
{code}

This behaviour however is not very extensible and does not provide enough support for SOLR-10610

h2. *Proposal:*
Following the suggestions from Tomás, me and Christine tried to come up with a solution that
is both extensible and practical from a user point of view.
And it is described as follows:

Please note that this is just a proposal and the code has not been written yet (however it
shouldn't differ too much from the current implementation)

Replicas have to be tagged using replica properties (separated by | in this example):
Example:

Shard1replica1 has birdColour=yellow and region=EMEA
shard2replica2 has region=US
shard3replica1 has birdColour=red

In order to use shard filtering, the requests need to have the parameter {noformat}filterByReplicaProp{noformat}
set to true.
This is needed as the computation for property filtering can be expensive(with big number
of replicas or properties) and the overhead may be noticeable.

h3. To use the replicas that have a specific property set to a specific value ("filter")

The request then should have the parameter {noformat}shards.filter{noformat} set to {noformat}replicaProp.PROPERTY_NAME:PROPERTY_VALUE{noformat}
Example:
{code:java}
filterByReplicaProp=true&shards.filter=replicaProp.region:EMEA
{code}
Which means that the replica properties need to be inspected and that the request should only
be executed on replicas that have the property {noformat}region{noformat} set to {noformat}EMEA{noformat}
Given the tag setup as above, this will only yield shard1replica1. 

h3. To use the replicas that do not have a specific property set to a specific value:
The request should have the parameter {noformat}shards.filterNot{noformat} set to {noformat}replicaProp.PROPERTY_NAME:PROPERTY_VALUE{noformat}
Example
For the purpose of this example let's suppose that there is a replica (shard3replica2) that
is under maintenance, and therefore it is tagged with:
{noformat}maintenance=yes{noformat}

Then the request would need to have:
{code:java}
filterByReplicaProp=true&shards.filterNot=replicaProp.maintenance:yes
{code}
This means that the replica properties need to be inspected and that the request should be
executed on replicas that do not have "maintenance" set to "yes"

Using {noformat}shards.filter=replicaProp...{noformat} or {noformat}shards.filterNot=replicaProp...{noformat}
without specifying {noformat}filterByReplicaProp=true{noformat} will cause exceptions.
Using {noformat}filterByReplicaProp=true{noformat} without specifying a filter will not cause
exceptions but is fundamentally useless and wastes computation time.
Filtering or filterNot on a property that is not present on any replica is likely to cause
exceptions (this is an implementation detail).

h3. An extension to this proposal is one for Canary ( SOLR-10610 ):
Given a suitably tagged environment, the requests willing to use the canary component will
have to specify both the {noformat}filterByReplicaProp=true{noformat} and then the {noformat}canary{noformat}
parameter set to {noformat}PROPERTY_NAME:PROPERTY_VALUE{noformat}, for example:
{code:java}
filterByReplicaProp=true&canary=birdColour:yellow
{code}
Which means, the replica properties need to be inspected and that the canary component has
to use the "canaries" that have the property {noformat}birdColour{noformat} set to {noformat}yellow{noformat}

The use of {noformat}canary{noformat} allows to clearly separate shard filtering with canary
while offering a similar feature.

For further information on this component please refer to SOLR-10610 .

Unfortunately, due to implementation details, we could not come up with a solution that did
not involve {noformat}filterByReplicaProp{noformat} or similar flags.
This is due to HttpShardHandler (the only place that has access to all the replicas and their
properties) being executed before any component and hiding the Replica class to the components
(the components are only given back a list of URLs and finding the replicas associated with
each URL would be a clear violation of encapsulation and separation of concerns).
Furthermore, we do not want HttpShardHandler to care about what the components are going to
do with the replica properties to not tie it to a specific implementation or add a myriad
of conditionals during its execution.

h3. Future extensions (out of scope for this patch):

* Replica type filtering could be supported via:
 {noformat}shards.filter=replicaType:PULL{noformat} which means: only use replicas whose type
is PULL
*  and similarly to above:
 {noformat}shards.filterNot=replicaType:NRT{noformat} which means: exclude all the replicas
whose type is NRT
 
* Node role filtering is also a possible extension, for example:
 {noformat}shards.filter=nodeRole:analytics{noformat} which means: only use replicas whose
role is analytics
*  and similarly to above:
 {noformat}shards.filterNot=nodeRole:overseer{noformat} which means: exclude all the replicas
whose role is overseer

> Support replica filtering by tag
> --------------------------------
>
>                 Key: SOLR-10880
>                 URL: https://issues.apache.org/jira/browse/SOLR-10880
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Domenico Fabio Marino
>         Attachments: SOLR-10880.patch, SOLR-10880.patch, SOLR-10880.patch, SOLR-10880.patch
>
>
> Add a mechanism to allow queries to use only a subset of replicas(by specifying the wanted
replica tag).
> Some replicas have to be marked as tag before running the query.
> A query has to specify ShardParams.REPLICA_TAG_NAME to specify what property holds the
tag it wants to use (for example "replica.tag") and then use ShardParams.REPLICA_TAG_LIKE
"tagName" to tell the ShardHandler to only use the replicas matching tagName.
> A query can also use ShardParams.REPLICA_TAG_DISLIKE "tagName" to use all the replicas
that do not match tagName.
> In order to properly use this system, replicas need to be tagged, tagging a replica involves
setting the property ShardParams.REPLICA_TAG_NAME to a property name and then set that property
in the replicas.
> An example can be seen in the ReplicaTagTest included in this patch where a dynamic cloud
has some tags assigned to it both randomly and on a fixed basis.
> A replica can have multiple tags attached to it, and these tags are separated by default
by "|"(pipe character), the delimiter can be changed by setting ShardParams.REPLICA_TAG_DELIMITER
in the query to anything else.
> No validity check is performed on the tags, therefore one may get an array of shard URLs
that contains empty URLs, or that is null(when the property does not exist), the user of this
feature has to deal with it.
> The tag to replica mappings are rebuilt for each query that specifies ShardParams.REPLICA_TAG_NAME.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message