james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Burrell Donkin" <robertburrelldon...@gmail.com>
Subject Re: [IMAP] Hibernate mailbox [WAS Re: Developer environment]
Date Sun, 12 Aug 2007 11:18:01 GMT
On 8/11/07, Zsombor <gzsombor@gmail.com> wrote:
>
>
> On 8/10/07, Robert Burrell Donkin <robertburrelldonkin@gmail.com> wrote:
> > On 8/10/07, Zsombor <gzsombor@gmail.com> wrote:

<snip>

> > (i'm interested in JDO and would be much more likely to contribute to
> > a OpenJPA implementation than to a plain hibernate one. the OpenJPA
> > team is also very friendly so i'm sure that they'd be happy to help
> > out with architecture.)
> >
> > > I have some question about the current IMAP code in the trunk. I know
> it's
> > > highly experimental code, and never released, but I'm curious to ask
> that do
> > > you think that the current API with the
> > >
> MailboxManagerProvider/MailboxManager/ImapMailboxSession
> will see
> > > revolutionary, or evolutionary changes ? I mean, that do you plan to
> rewrite
> > > it from scratch, or only minor method additions/deletions will occur? I
> > > know, you cant promise anything, but I dont want to waste my time, if
> > > someone totally rewrote the backend interface in the last few days, and
> > > intends to commit it  in the near future :)
> >
> > (i had it mind to move the mailbox code out from core into it's own
> > module. we've also talked about renaming some of the interfaces. i'll
> > start a thread on this today.)
> >
> > i've given up trying to make IMAP perform with the torque mailbox
> > implementation. this is partly down to an inefficient table structure
> > but mostly down to inefficiencies baked into the API (common IMAP
> > operations take numerous database calls to execute and bulky message
> > data is too often fetched).
>
> What do you think, which operations is the most common ones?

the IMAP protocol has a lot of redundancy built in so this depends on
the client :-/

i can produce good statistics about evolution but i know that other
clients are quite different

here's a typical use case

User opens email client which performs:
 * LOGIN to IMAP server
 * SELECT on a folder
 * FETCH basic meta data for all messages
 * FETCH structural data for all new messages

User reads an seen part of a message:
 * FETCH basic meta-data (client has already cache mail content)

User reads an unseen part of a MIME multipart message
 * FETCH unseen part of message (and cache)

User reads an unseen single part message
 * FETCH message

User reads an unseen MIME mulipart message
 * FETCH meta-data and one part

User moves a message into the folder from INBOX:
 * APPEND message

User clicks on another mailbox
 * SELECT on a folder
 * FETCH basic meta data for all messages
 * FETCH structural data for all new messages

in reality, most client do a lot more than this but conceptually, this
is reasonably accurate

IMHO there are three groups of calls which are critical to user
perception of performance. the first is mailbox selection, the second
is message meta-data, the third is message content.

IMAP is an unusual protocol and creates challenges in all three areas

> Currently I'm
> trying to figure out what is the main difference between the UID, MSN and
> KEY value, which is unique to the mailbox, and which to the whole
> repository.

IMHO the mailbox design suffers from being a compromise between a good
IMAP API and a good general API. the interface and implementation is
over-complicated but there are many features which are likely to be
IMAP specific in what was intended to be a general API.

MSN and UID are governed by the IMAP specification. KEY is general.

one of the challenges is that IMAP specifies two unique indexes: UID
and MSN. both UID and MSN are unique only within a mailbox. it is
tempting to use UID as a primary key but this would limit the size of
mailboxes to less that the specification. it may be possible to use a
computed PK by using byte arithmetic for mailbox and UID.

> My main concern of the current torque backend is that it's
> currently try to check the modifications in-VM, and deliver the notification
> synchronously, instead of a polling db-thread, which i think should be
> better.

(using a JCR allow registration for events rather than polling but
that's just a detail)

the synchronous notification stuff ATM is a side channel for data
changed by the current client. in other words, it is used to inform
the client when it has altered data. the session-based design does not
allow the client to be informed about changes made by other clients.

i would much prefer an event driven approach to asynchronous notifications

> (So multiple IMAP frontends can be deployed for one backend
> database).

this is not trivial for IMAP

for example, maintaining message numbers is challenging. one possible
approach would be to avoid storing message numbers in the database and
maintain them in each IMAP frontend.

> > i'm interested in JCP backends (rather than DB) so i no plans to fix
> > these problems. my plan for the experimental code is to introduce a
> > new API and provide an adapter for the mailbox API (rather than
> > rewrite it). i hope that this would allow mailbox implementations
> > which wish to optimise themselves for IMAP would be able to do so but
> > a slow but working IMAP would be possible even with a plain
> > implementation.
> >
> > that isn't to say that the mailbox API is fixed. it's in need of a
> > review. there's a lot of unintuitive naming and lots is not javadoc'd.
> > i suspect that you'll find as you implement that you'll find issues
> > and inefficiencies.
>
>
> Yeah, i found some. For example the
> getMailboxManager(user).createInbox(user) call. Or
> something like that.

a good example is IMAP SELECT. this requires 8 separate calls to the
mailbox API each of which makes multiple calls to the database.
several of these calls require multiple table joins. SELECT is very
slow. i would prefer an API with a single call which returns meta-data
about the mailbox using a fetch group to specify what data is
required.

- robert

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message