james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernd Fondermann <bf_...@brainlounge.de>
Subject [mime4j] Storage resource management [long]
Date Thu, 09 Oct 2008 21:26:05 GMT

It's high time to move the discussion back to server-dev, me thinks.

Until now, this was primarily discussed at JIRAs MIME4J-72
and additional discussion in thread
   MIME4J - remove parent reference from Body?

Thanks to Markus Wiederkehr for starting this initiatives.

The underlying requirement is motivated by the fact that we don't want 
large email parts to be stored in memory - because either one part 
exceeds the available heap or a set of parts in sum do.

So the requirement is to have those large parts stored on disk.

The current mime4j implementation uses "temporary storage", making use 
of temporary files as supported by the JDK.

This is a choice made easy by the JDK, yet it has its shortcomings.
Tempory files of this kind might not be so temporary. They are collected 
only on JVM termination. For long running applications, where JVM 
termination happens not that often, its easy to run out of disk space 
and file descriptors.
Thus it was proposed to extend the mime4j API to dispose of temporary 
files, make them shorter lived.

Another requirement voiced is to be able to copy emails. To save 
resources, it was proposed to implement this by re-using (not 
deep-copying) those objects which are not altered, including possibly 
reusing one temporary file for more than part.

I don't think that JDK temp files are neccessarily a good choice. I can 
imagine different implementations using a specific disk partition, user 
account, quota, directory - anything where the developers can take more 
control about the storage.
An optimization would be to keep parts up to a few KB in memory and only 
starting to write out to disk if a part grows beyond that point (or 
available heap drops below a threshold).
All this would require a more sophisticated resource management than 
what we have now. Moreover, I firmly believe that the current temp file 
stuff already does, when disposing and copy-on-write is combined!

What we have now (without dispose()) is a more or less immutable, 
threadsafe, kind-of-idempotent object model. This did fundamentally 
changes with introducing dispose(). Not neccessarily a bad thing, but 
worth noting and discussing.

I suggest to take a close look at the recent changes and proposed 
patches concerning resource management optimization. Let's not include 
it in the upcoming release (or delay the release) and make sure we 
understand it and assure it works properly.

Ideally, I'd like to see this resource management stuff separated from 
the core entities and parts object model, so we can properly test it and 
let our users make a proper choice depending on their use cases. For 
short running, simple usage the existing and extended naive temporary 
file management might be a reasonable choice. For long running, 
multithreaded applications a more sophisticated approach (to be 
implemented!) might be more appropriate. (And I must admit that I'm a 
user from the latter camp.)

I am not trying to artificially complicate things, just throwing my own 
requirements and architectural experiences in.

Thanks for listening,


To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org

View raw message