james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bernd Fondermann" <bernd.fonderm...@googlemail.com>
Subject Re: Confirmed but unidentified memory leak in RC2
Date Mon, 11 Sep 2006 17:44:44 GMT
On 9/11/06, Stefano Bagnara <apache@bago.org> wrote:
> Bernd Fondermann wrote:
> > So how do we reproduce this beast? Do we have a "no load" reproduction
> > raising OOMs?
>
> So until Noel won't decide to use a real profiler (with per-instance
> allocation time tracking) and not hprof :-P (which leave us guessing
> where the problem is) or won't provide us at least the config.xml I stop
> my tests.

fair enough.

> About your comparison between 2.2.0 and 2.3.0: it should be done in a
> controlled environment (postage). I don't expect the number of spam
> connections to Noel's server in these days to be the same of an year
> ago, or even few months ago. My personal server have almost an average
> of 5 times traffic than 1 year ago and I receive/send the same amount of
> valid emails.

my point was, maybe this is not a newly introduced bug... but
validating this seems to be impossible since the mail server world
changed so heavily, as you suggest.

> Furthermore I think that 2.3.0 is already faster than 2.2.0 but I don't
> care too much of this issue: imo the real goal is that 2.3.0 should be
> more RFC compliant and have less bugs (or at least less critical) than
> 2.2.0. If you look at the changelog you will see that 2.2.0 has really
> bad bugs. If 2.3.0 fixes that bugs (without introducing worse bugs) and
> work 5% slower it would be a good thing anyway.

agreed, but as I understand it, performance _is_ an issue regarding
the growing traffic and recently surfacing spooling defects.

> While writing this mail, curiosity brought me testing my current postage
> setup against 2.2.0 and 2.3.0-current.

yes, I did this before, too. you can really quickly drive 2.2 against
the wall, if you want.

<snip/>
> So I increased the memory for James 2.2.0 to 10M. After 1 minute postage
> say "unmatched: 299, matched: 0".. I don't know if this is an
> incompatibility between james 2.2 and postage, but I also see all the
> messages still in the spool. After 2 minutes "unmatched: 586, matched:
> 5", after 3 minutes "unmatched: 878, matched: 13".. then again OOM.
>
> This time I increased the Xmx to 20M. After 3 minutes again "unmatched:
> 879, matched: 14", this time no OOM.. At the end of the test "1451
> unmatched" (UNDELIVERED) and 31 matched.

When concerned with sending/receiving email, Postage should be
completely mail server agnostic. Only SMPT/POP3 is used.

<snip/>
> After the 5 minutes I had almost 500 messages in outgoing
> that have been ALL succesfully delivered to postage in the "2 minutes"
> window postage wait at the end of the test. So the raw result was:
> "matched: 2616, unmatched: 1". Isn't this cool?

Well, the "1" is a bug in Postage, not so cool ;-)
But yes, it is quite impressive. I also noticed similar behavior. 2.3
is a substantial  improvement over 2.2. but you know better than I do
:-)

<snip/>
> Please note that I never tested 2.2.0 before: this is simply my current
> "random" 2.3.0 test applied to 2.2.0. And the difference is imo
> *impressive*: James 2.2.0 needed 20MB for the 300mail/minute test, James
> 2.3.0 did 500mail/minute (under stress pressure) using 5MB and without
> throwing OOM. Furthermore James 2.2.0 had an increasing spool size while
> 2.3.0 had no problem on the spool. Again, this is not a test to
> demonstrate that 2.3.0 is better than 2.2.0. Maybe there are tests that
> works better on 2.2.0 or tests that works even better on 2.3.0. But as a
> fist try I would simply say *WOW*.

That is why I think that "memory leak" is a delicate term.
If what we currently have is "the best James ever", we should release pronto.
But I expect problems to pop up, after such a long time with no
release and such big changes to the code base.

If today someone steps up and says, "I have this config and only
changed releases, and before it ran for 10 days until restart and now
it's only 5 hours", it would not be acceptable to release the
software, because the server would not be ready for enterprise
production purposes.

> Bottom line: I really would love to have an "adaptive" Postage
> configuration that simply try to find out what is the maximum flow for
> the given configuration.

:-) This would mean a not-so-small refactoring, because the triggering
of "taking samples", e.g. sending mails, has to be completly revised.
With the changes I am about to check in you are only able to increase
the mail sizes which is not the same. Adaptive load is coming later,
except someone steps up with a solution. I'd be happy to help.

  Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message