lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gus Heck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores
Date Sun, 21 Feb 2016 20:04:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156203#comment-15156203
] 

Gus Heck commented on SOLR-8349:
--------------------------------

*WRT #3/derministic behavior*: Here's the use case:

# server is started, it loads a component that loads a file and creates resource A version
1 into memory
# some time later the file is updated, and these updates need to be deployed
# the new version 2 of the file is deployed to the server and the core is unloaded 
# the core is then loaded again and brought on line and made available to users.

We now cannot predict which version of the resource is available to the users. If GC occured
and the resource was collected between steps 3 and 4 the new resource will become available
as the user would expect. If not, the old resource will show up on calls to getResource()
until a GC occurs in which the JVM decides to clear the weak reference to it. If the component
caches a (hard) reference to the resource, the new version of the resource will never get
loaded. The previous system without weak references did not allow the old resource to ever
be unloaded (and hence was deterministic). Now the behavior is a product of GC timing and
the internal aspects of how the component was programmed. I would like to subsequently (in
some later patch) make it possible to refresh the resource in a predictable manner without
restarting the whole node.

*WRT hard references*: I want people to have success not missteps and re-implementation using
my feature :). For this reason I really like the weak references suggestion you made, but
I want to manage it for them and not burden them with handling it properly. The submitted
approach was meant to not bite the user who writes a component that never holds a reference
to the resource. This would be a reasonable naive implementation for someone who knows nothing
about the internals of solr and assumed they shouldn't hold the reference to ensure that the
same resource was always seen everywhere.

*WRT the abstraction*: it's there to get the loading code added to the deferredCallables list.
 SolrResourceLoader has no knowledge of the SolrCore until the core calls inform(core) on
it. Unfortunately inform(resourceLoader) gets called before that. So any attempt to cast and
do ((SolrResourceLoader)loader).getCore().getContainer() in the implementation of ResourceLoader#inform(loader)
will throw an NPE. That's why the deferredCallables list exists. I chose to add the abstraction
to enable the loader/core to manage hard references and allow the processing to become uniform
with all loads being deferred. I wanted the folks attempting to use this to have a clear intuitive
path to do so and the interfaces are meant to guide them into doing the right thing without
needing to know all the details.

It's worth noting that if the goal is a simple patch, the way to eliminate the MOST complexity
from the patch is to have the component author manage references, and change: 
{code}
      resourceLoader.inform(resourceLoader);
      resourceLoader.inform(this); // last call before the latch is released.
{code}
 to
{code}
      resourceLoader.inform(this); 
      resourceLoader.inform(resourceLoader); // last call before the latch is released.
{code}

In that case, casting and navigating to the container in inform(ResourceLoader) will work
and  we can loose the abstractions, the deferred callables and associated latch/synchronization,
and the object reference code goes away too... but I definitely don't feel qualified to change
the order in which components are made aware of things. I have no idea if any code out there
would be relying on this order of inform() calls in some way. 

Lastly, Object key's are certainly possible, though this does reintroduce a vector for class
loader memory leakages as previously discussed. I left this out because we were not supporting
the lucene analyzers yet, and I wasn't yet adding "automatic" keys from configuration nodes.
Automatic keys would be a nice feature to improve the feature and ensure implementors don't
need to think so hard to use it. I'm amenable to try adding that now if you like, though the
option to supply one's own key should remain.


> Allow sharing of large in memory data structures across cores
> -------------------------------------------------------------
>
>                 Key: SOLR-8349
>                 URL: https://issues.apache.org/jira/browse/SOLR-8349
>             Project: Solr
>          Issue Type: Improvement
>          Components: Server
>    Affects Versions: 5.3
>            Reporter: Gus Heck
>         Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large dictionary or
other in-memory structure. When multiple cores are loaded with identical configurations utilizing
this large in memory structure, each core holds it's own copy in memory. This has been noted
in the past and a specific case reported in SOLR-3443. This patch provides a generalized capability,
and if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message