lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomás Fernández Löbbe (JIRA) <j...@apache.org>
Subject [jira] [Commented] (SOLR-13439) Make collection properties easier and safer to use in code
Date Tue, 14 May 2019 15:51:00 GMT

    [ https://issues.apache.org/jira/browse/SOLR-13439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839573#comment-16839573
] 

Tomás Fernández Löbbe commented on SOLR-13439:
----------------------------------------------

bq. This identical to my Case 1 if the core lives for 30 minutes?
OK. My point is comparing watchers vs cache. Since the patch is keeping watchers, this can
still work in the same way as long as you keep using watchers.

bq. What you say is true, but it's no worse than having a watch set
Exactly, which is why the option of not setting a watch exists for cases that only need infrequent
access.

bq. *TLDR;*This patch does pose a risk for systems with large numbers of collections that
also frequently update collection properties on many of those collections if that system was
not already using collectionPropertiesWatches (or only using them on a few collections) prior
to this patch (i.e. most updates to properties are ignored).

Not on only those cases. There are two more leak scenarios that I’d like to consider:
# Even if the {{PropsWatcher}} expires and it’s removed from the cache, ZooKeeper server
will still maintain the watch referenced to this zkClient, that means we just keep accumulating
them for every call to {{getCollectionProperties}}. If you do this to many collections and
collection properties changes are infrequent, this may become a problem. I believe the {{PropsWatcher}}
object won’t be garbage collected, since it includes the callback. 
# Even if you keep calling {{getCollectionProperties}} on the same collection, but after expiration,
you keep adding new watches, for the same path. That not only means leaks of watches in ZooKeeper
server and {{PropsWatcher}} objects in Solr, but also that, if at some point later in time
a collection property changes all those watches are going to fire at the same time. Even more,
if at the time of the fire, the collection is in the cache (because of a recent call to {{getCollectionProperties}}),
all those watches for the same path, will re-fetch the collections properties file and then
re-set. This arguably can happen today, if someone uses watchers in the same way (add/remove
watches rapidly), but the idea of the watch is to be longer lived (i.e. the life of the core),
while with the patch it would happen with components that just call {{getCollectionProperties}}
with some frequency (i.e. per request, in cases with low request rate)

WDYT about:
# Making the {{getCollectionProperties}} optionally set a “timed watch” (users can opt
out, in which case no zk watch will be set at all, similar to the current implementation in
master, but less tricky, since users are now opting out of caching).
# Replace the  “expiring cache” with the use of “timed watches”. A timed watch is
only removed when the zk watch fires (i.e. a change in collection properties) and the time
has expired. With this I mean:
    ## The properties will be in cache “at least” the time specified* (this is best effort.
On connection reset, we don't need to reset the timed watches, but we may need to remove leftover
properties)
    ## If a “timed watch” was used, the properties won’t be removed from cache until
the watch actually fires at least once after the expiration time. This can be considered a
leak too, but is consistent with the lifecycle of the zk watch and the PropsWatcher object.
This will prevent Solr from adding a new zk watch on the same collection.
    ## The logic is simple, we just need to modify {{PropsWatcher}} to receive an expiration
date, and make it check that date on {{process}}
    ## The expiration date can be pushed forward (maybe {{getCollectionProperties}} can receive
the “cacheDuration”, which is the min duration you want for the watch, a value <= 0
means “don’t cache/watch” while a number > 0 means the minimum duration to keep the
watch/cache)

Just an idea, I think this will cover the TRA use case while preventing those leaks and accommodating
all existing use cases.

> Make collection properties easier and safer to use in code
> ----------------------------------------------------------
>
>                 Key: SOLR-13439
>                 URL: https://issues.apache.org/jira/browse/SOLR-13439
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
>         Attachments: SOLR-13439.patch, SOLR-13439.patch
>
>
> (breaking this out from SOLR-13420, please read there for further background)
> Before this patch the api is quite confusing (IMHO):
>  # any code that wanted to know what the properties for a collection are could call zkStateReader.getCollectionProperties(collection)
but this was a dangerous and trappy API because that was a query to zookeeper every time.
If a naive user auto-completed that in their IDE without investigating, heavy use of zookeeper
would ensue.
>  # To "do it right" for any code that might get called on a per-doc or per request basis
one had to cause caching by registering a watcher. At which point the getCollectionProperties(collection) magically
becomes safe to use, but the watcher pattern probably looks famillar induces a user who hasn't
read the solr code closely to create their own cache and update it when their watcher is
notified. If the caching side effect of watches isn't understood this will lead to many in-memory
copies of collection properties maintained in user code.
>  # This also creates a task to be scheduled on a thread (PropsNotification) and induces
an extra thread-scheduling lag before the changes can be observed by user code.
>  # The code that cares about collection properties needs to have a lifecycle tied to
either a collection or something other object with an even more ephemeral life cycle such
as an URP. The user now also has to remember to ensure the watch is unregistered, or there
is a leak.
> After this patch
>  # Calls to getCollectionProperties(collection) are always safe to use in any code anywhere.
Caching and cleanup are automatic.
>  # Code that really actually wants to know if a collection property changes so it can
wake up and do something (autoscaling?) still has the option of registering a watcher that
will asynchronously send them a notification.
>  # Updates can be observed sooner via getCollectionProperties with no need to wait for
a thread to run. (vs a cache held in user code)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message