jakarta-jcs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Smuts <asm...@yahoo.com>
Subject Re: JCS remote cache client shutdown behaviour
Date Wed, 09 Sep 2009 14:30:15 GMT
I don't think that shutdown will properly kill the event queues on the server side.  The server
will queue and retry 10 times before killing the queue and marking it non functional.  If
you have a large number of items and lots of clients that keep going down, then you could
see a memory spike.  

We should look into the client shutdown process.  

Aaron

--- On Mon, 9/7/09, Niall Gallagher <niall@switchfire.com> wrote:

> From: Niall Gallagher <niall@switchfire.com>
> Subject: JCS remote cache client shutdown behaviour
> To: "JCS Developers List" <jcs-dev@jakarta.apache.org>
> Date: Monday, September 7, 2009, 7:50 AM
> Hi,
> 
> I'm wondering if anyone can explain the sequence of steps
> the JCS client
> code is supposed to follow when
> CompositeCacheManager.shutDown() is
> called client-side? We are intermittently seeing high
> memory usage in
> our JCS remote server, which appears to be caused by large
> backlogs of
> event objects queued for delivery to client machines which
> have been
> shut down, even though we are shutting down our client
> machines
> gracefully using the method above. This is certainly
> aggravated by our
> network's architecture, but I'm not sure if the root cause
> might be a
> bug in JCS or I'm not understanding what should happen
> properly.
> 
> When we call CompositeCacheManager.shutDown() on a client
> machine, from
> our client-side logs it appears that the dispose() method
> in this object
> is getting called correctly for each cache region:
> http://svn.apache.org/viewvc/jakarta/jcs/trunk/src/java/org/apache/jcs/auxiliary/remote/RemoteCacheListener.java?view=markup
> 
> However that method appears to just unexport the RMI
> RemoteCacheListener
> object for each region client-side; basically terminating
> the
> client-side end of the event delivery connection. Before
> disconnecting
> though, shouldn't this method notify the server that the
> client is about
> to disconnect?
> 
> Subsequently we often see errors like this in the remote
> server log:
> 
> 
> 07-Sep 13:52:13,347 INFO  [jcs.engine.CacheEventQueue]
> Error while running event from Queue: RemoveEvent for [GAN:
> groupId=[groupId=<region name>, defaultGroup],
> attrName=<cache key>]. Retrying...
> 07-Sep 13:52:13,747 WARN  [jcs.engine.CacheEventQueue]
> java.rmi.ConnectException: Connection refused to host:
> <client machine ip address>; nested exception is:
>         java.net.ConnectException:
> Connection refused
> 07-Sep 13:52:13,748 WARN  [jcs.engine.CacheEventQueue]
> Error while running event from Queue: RemoveEvent for [GAN:
> groupId=[groupId=<region name>, defaultGroup],
> attrName=<cache key>]. Dropping Event and marking
> Event Queue as non-functional.
> 
> 
> ...this implies the remote server continues to try to
> deliver events to
> the JCS client which disconnected, as if the client didn't
> de-register
> itself before disconnecting.
> 
> Perhaps I've missed something in the code.
> 
> I see that the RemoteCacheServer API (to which clients
> connect) does in
> fact have a server-side dispose() method which (on initial
> investigation) would "de-register" the client from the
> server's list of
> event listeners. Could it be that JCS clients are simply
> not calling
> this method?..
> http://svn.apache.org/viewvc/jakarta/jcs/trunk/src/java/org/apache/jcs/auxiliary/remote/server/RemoteCacheServer.java?view=markup
> 
> 
> This issue is a problem for us depending on which network
> subnet the
> client machine is in. Basically our network is divided into
> 2 subnets,
> with a fairly rubbish (or overly-strict) router/firewall
> between the two
> subnets. This router does not relay networking errors (ICMP
> error
> messages) between the two subnets. When a machine in one
> subnet goes
> offline and a machine in the other subnet tries to connect
> to it, our
> router does not notify the source machine that the target
> machine is
> offline, and so the source machine waits indefinitely (i.e.
> with a
> socket in the open wait state) for a response from the
> target machine.
> On the other hand when both machines are in the same
> subnet, the source
> machine gets a "host not reachable" exception immediately
> when a target
> machine is offline.
> 
> Anyway... the problem is when we shut down a client machine
> in a
> different subnet, the JCS remote server builds up a large
> backlog of
> cache event objects, presumably trying to connect to a
> disconnected
> client, and eventually runs out of memory. We determine
> this using the
> JDK's jmap command - we find a large number of PutEvent and
> RemoveEvent
> objects in the remote server's memory. We don't have the
> issue when both
> machines are in the same subnet, but I wonder if that's
> because JCS
> remote server is relying on the networking errors, and is
> de-registering
> clients automatically after a certain number of failed
> attempts to
> connect to the client. i.e. perhaps clients are not
> de-registering
> themselves gracefully from the remote server in the first
> place.
> 
> Does anyone have any experience with this- anyone regularly
> see "Error
> while running event from Queue" in the remote server logs?
> I realise our
> network setup is partly to blame here, but perhaps the root
> cause is
> that client's are not de-registering properly.
> 
> Many thanks in advance,
> 
> Niall
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: jcs-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: jcs-dev-help@jakarta.apache.org


Mime
View raw message