james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Bagnara <apa...@bago.org>
Subject Re: Decision required for JAMES-603
Date Sat, 02 Sep 2006 17:19:40 GMT
Noel J. Bergman wrote:
> The following is a summary of the problem.
>   1) It occurs ONLY when using JDBCSpoolRepository for RemoteDelivery
>   2) If there are more items in the spool than fit in the cache, it is
>      possible to delay delivery for messages that ought to be delivered.
>   3) If iterating through the cache takes more than one second, it is
>      possible to spinloop.

I'm investigating further on the problem. I had this again today, even 
if I raised the maxcache to 10000 and I had less than 10000 messages. So 
something weird is happening and I have to check it better.
Furthermore I'm under the impression that I have a similar issue also on 
the main spool manager... Maybe there are multiple problems so I have to 
fix some of them to check for the others.

> There are a variety of approaches.  One is to fix it.  So far neither
> Stefano nor I (not that I've had much time to look, but he spent all day on
> it), have come up with a trivial fix.  The types of fixes for this code
> would push back release for weeks.  At that point I might as well implement
> the right long-term change, planned for the next release, rather than a
> one-off bandaid to resolve v2.3.

The long term change needs a db change and we decided to keep db 
structure unchanged until 3.0 so imho we need a fix for 2.3 and 2.4 that 
doesn't include changing the db to replace the last_updated with 
next_processing_time or something similar.

> Alternatively, we could add a configuration parameter for the hardcoded
> timeout value (there is already one for the cache size), document the
> potential problem, and release JAMES v2.3.

Imho the problem is not the timeout: the timeout is there to avoid that 
all the threads run the same query on the repository when there are no 
messages. Without timeout you would need 50 queries to decide that you 
have nothing to do, with the timeout this is fixed. Increasing the 
timeout is an hack and would work only because we already have an hack 
that our threads wake up every 60 seconds (we don't do this for file 
repositories that works better regarding to this issue).

> I do not want to just remove the cache, which is one of Stefano's
> suggestions.  The cache prevents JAMES from crashing when the message
> arrival rate is higher than it can process.  Throwing OOMs and possibly
> discarding messages in the process is not acceptable.

I think that now we have a behaviour that is buggy and difficult to 
understand and to solve. I want to have it fixed on my system before 
deciding what to do with 2.3.0.

And my preferred solution, now, is not the removal of the cache but a 
complete rewrite of the chaching algorythm and the accept mechanism 
without changing the db.

> Recognize that part of the problem is the conflating of the RemoteDelivery
> spool and the main pipeline spool, which have different requirements, since
> the former applies scheduling on top of the spool.  Again, that's on the
> roadmap to change, but wasn't planned for v2.3.
> 	--- Noel

Well, we have a bug and we may need to change the original plan.
I still think there is something more about this issue to be discovered 
so I will talk about possible solutions later, when I'll have 
investigated a little more on this hard issue.


To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message