james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoit Tellier (Jira)" <server-...@james.apache.org>
Subject [jira] [Updated] (JAMES-3576) Further denormalize Message entity?
Date Mon, 03 May 2021 13:50:00 GMT

     [ https://issues.apache.org/jira/browse/JAMES-3576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Benoit Tellier updated JAMES-3576:
----------------------------------
    Description: 
h3. The facts

Here is our message structure:

{code:java}
cqlsh:apache_james> DESCRIBE TABLE imapuidtable ;
CREATE TABLE apache_james.imapuidtable (
    messageid timeuuid,
    mailboxid timeuuid,
    uid bigint,
    flaganswered boolean,
    flagdeleted boolean,
    flagdraft boolean,
    flagflagged boolean,
    flagrecent boolean,
    flagseen boolean,
    flaguser boolean,
    modseq bigint,
    userflags set<text>,
    PRIMARY KEY (messageid, mailboxid, uid)
) WITH comment = 'Holds mailbox and flags for each message, lookup by message ID';

cqlsh:apache_james> DESCRIBE TABLE messageidtable  ;
CREATE TABLE apache_james.messageidtable (
    mailboxid timeuuid,
    uid bigint,
    flaganswered boolean,
    flagdeleted boolean,
    flagdraft boolean,
    flagflagged boolean,
    flagrecent boolean,
    flagseen boolean,
    flaguser boolean,
    messageid timeuuid,
    modseq bigint,
    userflags set<text>,
    PRIMARY KEY (mailboxid, uid)
) WITH comment = 'Holds mailbox and flags for each message, lookup by mailbox ID + UID';

cqlsh:apache_james> DESCRIBE TABLE messagev3  ;
CREATE TABLE apache_james.messagev3 (
    messageid timeuuid PRIMARY KEY,
    bodycontent text,
    bodyoctets bigint,
    bodystartoctet int, 
    attachments list<frozen<attachments>>,
   // and also message properties
) WITH comment = 'Holds message metadata, independently of any mailboxes. Content of messages
is stored in `blobs` and `blobparts` tables. Optimizes property storage compared to V2.';
{code}

Some very common patterns is to access messages headers.

 - imap-reorg.png (attached) shows me opening my IMAP mailbox after a long weekend. We can
see that my MUA lists headers of the 108 messages received in the time laps. We can see that,
in order to retrieve the storage informations, the messagev3 table needs to be accessed for
each message, generating a huge count of PRIMARY KEY reads that are not strictly necessary,
and reading messageV3 yields second place in query time occupation.

 - Similar things happens on top of JMAP. jmap-reorg.png shows 2 webmail email list loads.
Same things: For each message entry, we need to query messagev3 to retrieve storage informations
and being able to retireve headers. Here messagev3 reads yields first place, before the message
metadata reads, before the header reads.

h3. The bit of Cassandra philosophy we might have missed...

https://www.datastax.com/blog/basic-rules-cassandra-data-modeling

{code:java}
# Non-Goals

## Minimize the Number of Writes

Writes in Cassandra aren't free, but they're awfully cheap. Cassandra is optimized for high
write throughput, and almost all writes are equally efficient [1].

## Minimize Data Duplication

Denormalization and duplication of data is a fact of life with Cassandra. Don't be afraid
of it. [...] In order to get the most efficient reads, you often need to duplicate data.

# Basic goals

[...]

## Rule 2: Minimize the Number of Partitions Read

[...] Furthermore, even on a single node, it's more expensive to read from multiple partitions
than from a single one due to the way rows are stored.
{code}

https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html

{code:java}
 An incorrect data model can turn a single query into hundreds of queries, resulting in increased
latency, decreased throughput, and missed SLAs.
{code}

(This one is of an article about compaction but my feeling is that it is very relevant to
the situation I describe, so I could not refrain from quoting it...)

h3. The new data-model

I propose to do the following:

{code:java}
cqlsh:apache_james> ALTER TABLE messageIdTable ADD internalDate timestamp ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD bodyStartOctet int  ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD fullContentOctets bigint  ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD headerContent text  ;

cqlsh:apache_james> ALTER TABLE imapUidTable ADD internalDate timestamp ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD bodyStartOctet int  ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD fullContentOctets bigint  ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD headerContent text  ;
{code}

That way we can easily  resolve METADATA and HEADERS FetchGroups against both messageIdTable
and imapUidTable, effectively limiting messageV3 reads to the FULL body reads.

h3. Expectations

This will effectively reduce the Cassandra query load for both JMAP and IMAP, effectively
speeding up James and allowing us to scale to larger workloads given the exact same infrastructure.
A boost ranging from 25% to 33% is expected for both IMAP, JMAP and POP3 workloads.

h3. Migration strategy

 - 1. The admin ALTER the tables
 - 2. The admin deploys the new version of James. New written data is then fully denormalized...
 - 3. But old written data still needs reads to messagev3 to be served (if expected data is
not in messageIdTable or in imapUidTable we know we need to read it from messagev3 table).
 - 4. We propose a migration task that effectively look up messagev3 to populate newly created
rows for messageIdTable and imapUidTable - this way an admin can ensure to fully benefit from
the enhancement given previously existing data.

I think the classical migration strategy is not a good fit for this one as:
 - fallback mechanisms incurs performance degradations (double the amount of reads in the
transition period) and message metadata query speed is critical. With the proposed strategy
during the transition period at worst the previous behavior is applied.
 - Creating and deleting tables is messy, when simple in-place modification do not generate
data model gardbage.
 - We can add a startup-check to ensure the rows are correctly here (and abort startup if
not)


  was:
h3. The facts

Here is our message structure:

{code:java}
cqlsh:apache_james> DESCRIBE TABLE imapuidtable ;
CREATE TABLE apache_james.imapuidtable (
    messageid timeuuid,
    mailboxid timeuuid,
    uid bigint,
    flaganswered boolean,
    flagdeleted boolean,
    flagdraft boolean,
    flagflagged boolean,
    flagrecent boolean,
    flagseen boolean,
    flaguser boolean,
    modseq bigint,
    userflags set<text>,
    PRIMARY KEY (messageid, mailboxid, uid)
) WITH comment = 'Holds mailbox and flags for each message, lookup by message ID';

cqlsh:apache_james> DESCRIBE TABLE messageidtable  ;
CREATE TABLE apache_james.messageidtable (
    mailboxid timeuuid,
    uid bigint,
    flaganswered boolean,
    flagdeleted boolean,
    flagdraft boolean,
    flagflagged boolean,
    flagrecent boolean,
    flagseen boolean,
    flaguser boolean,
    messageid timeuuid,
    modseq bigint,
    userflags set<text>,
    PRIMARY KEY (mailboxid, uid)
) WITH comment = 'Holds mailbox and flags for each message, lookup by mailbox ID + UID';

cqlsh:apache_james> DESCRIBE TABLE messagev3  ;
CREATE TABLE apache_james.messagev3 (
    messageid timeuuid PRIMARY KEY,
    bodycontent text,
    bodyoctets bigint,
    bodystartoctet int, 
    attachments list<frozen<attachments>>,
   // and also message properties
) WITH comment = 'Holds message metadata, independently of any mailboxes. Content of messages
is stored in `blobs` and `blobparts` tables. Optimizes property storage compared to V2.';
{code}

Some very common patterns is to access messages headers.

 - imap-reorg.png (attached) shows me opening my IMAP mailbox after a long weekend. We can
see that my MUA lists headers of the 108 messages received in the time laps. We can see that,
in order to retrieve the storage informations, the messagev3 table needs to be accessed for
each message, generating a huge count of PRIMARY KEY reads that are not strictly necessary,
and reading messageV3 yields second place in query time occupation.

 - Similar things happens on top of JMAP. jmap-reorg.png shows 2 webmail email list loads.
Same things: For each message entry, we need to query messagev3 to retrieve storage informations
and being able to retireve headers. Here messagev3 reads yields first place, before the message
metadata reads, before the header reads.

h3. The bit of Cassandra philosophy we might have missed...

https://www.datastax.com/blog/basic-rules-cassandra-data-modeling

{code:java}
# Non-Goals

## Minimize the Number of Writes

Writes in Cassandra aren't free, but they're awfully cheap. Cassandra is optimized for high
write throughput, and almost all writes are equally efficient [1].

## Minimize Data Duplication

Denormalization and duplication of data is a fact of life with Cassandra. Don't be afraid
of it. [...] In order to get the most efficient reads, you often need to duplicate data.

# Basic goals

[...]

## Rule 2: Minimize the Number of Partitions Read

[...] Furthermore, even on a single node, it's more expensive to read from multiple partitions
than from a single one due to the way rows are stored.
{code}

https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html

{code:java}
 An incorrect data model can turn a single query into hundreds of queries, resulting in increased
latency, decreased throughput, and missed SLAs.
{code}

(This one is of an article about compaction but my feeling is that it is very relevant to
the situation I describe, so I could not refrain from quoting it...)

h3. The new data-model

I propose to do the following:

{code:java}
cqlsh:apache_james> ALTER TABLE messageIdTable ADD internalDate timestamp ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD bodyStartOctet bigint  ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD fullContentOctets bigint  ;
cqlsh:apache_james> ALTER TABLE messageIdTable ADD headerContent text  ;

cqlsh:apache_james> ALTER TABLE imapUidTable ADD internalDate timestamp ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD bodyStartOctet bigint  ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD fullContentOctets bigint  ;
cqlsh:apache_james> ALTER TABLE imapUidTable ADD headerContent text  ;
{code}

That way we can easily  resolve METADATA and HEADERS FetchGroups against both messageIdTable
and imapUidTable, effectively limiting messageV3 reads to the FULL body reads.

h3. Expectations

This will effectively reduce the Cassandra query load for both JMAP and IMAP, effectively
speeding up James and allowing us to scale to larger workloads given the exact same infrastructure.
A boost ranging from 25% to 33% is expected for both IMAP, JMAP and POP3 workloads.

h3. Migration strategy

 - 1. The admin ALTER the tables
 - 2. The admin deploys the new version of James. New written data is then fully denormalized...
 - 3. But old written data still needs reads to messagev3 to be served (if expected data is
not in messageIdTable or in imapUidTable we know we need to read it from messagev3 table).
 - 4. We propose a migration task that effectively look up messagev3 to populate newly created
rows for messageIdTable and imapUidTable - this way an admin can ensure to fully benefit from
the enhancement given previously existing data.

I think the classical migration strategy is not a good fit for this one as:
 - fallback mechanisms incurs performance degradations (double the amount of reads in the
transition period) and message metadata query speed is critical. With the proposed strategy
during the transition period at worst the previous behavior is applied.
 - Creating and deleting tables is messy, when simple in-place modification do not generate
data model gardbage.
 - We can add a startup-check to ensure the rows are correctly here (and abort startup if
not)



> Further denormalize Message entity?
> -----------------------------------
>
>                 Key: JAMES-3576
>                 URL: https://issues.apache.org/jira/browse/JAMES-3576
>             Project: James Server
>          Issue Type: Improvement
>          Components: IMAPServer, JMAP
>    Affects Versions: 3.6.0
>            Reporter: Benoit Tellier
>            Assignee: Antoine Duprat
>            Priority: Major
>              Labels: perf
>             Fix For: 3.7.0
>
>         Attachments: imap-reorg.png, jmap-reorg.png, poc_after_gatling.png, poc_after_glowroot.png,
poc_before_gatling.png, poc_before_glowroot.png
>
>
> h3. The facts
> Here is our message structure:
> {code:java}
> cqlsh:apache_james> DESCRIBE TABLE imapuidtable ;
> CREATE TABLE apache_james.imapuidtable (
>     messageid timeuuid,
>     mailboxid timeuuid,
>     uid bigint,
>     flaganswered boolean,
>     flagdeleted boolean,
>     flagdraft boolean,
>     flagflagged boolean,
>     flagrecent boolean,
>     flagseen boolean,
>     flaguser boolean,
>     modseq bigint,
>     userflags set<text>,
>     PRIMARY KEY (messageid, mailboxid, uid)
> ) WITH comment = 'Holds mailbox and flags for each message, lookup by message ID';
> cqlsh:apache_james> DESCRIBE TABLE messageidtable  ;
> CREATE TABLE apache_james.messageidtable (
>     mailboxid timeuuid,
>     uid bigint,
>     flaganswered boolean,
>     flagdeleted boolean,
>     flagdraft boolean,
>     flagflagged boolean,
>     flagrecent boolean,
>     flagseen boolean,
>     flaguser boolean,
>     messageid timeuuid,
>     modseq bigint,
>     userflags set<text>,
>     PRIMARY KEY (mailboxid, uid)
> ) WITH comment = 'Holds mailbox and flags for each message, lookup by mailbox ID + UID';
> cqlsh:apache_james> DESCRIBE TABLE messagev3  ;
> CREATE TABLE apache_james.messagev3 (
>     messageid timeuuid PRIMARY KEY,
>     bodycontent text,
>     bodyoctets bigint,
>     bodystartoctet int, 
>     attachments list<frozen<attachments>>,
>    // and also message properties
> ) WITH comment = 'Holds message metadata, independently of any mailboxes. Content of
messages is stored in `blobs` and `blobparts` tables. Optimizes property storage compared
to V2.';
> {code}
> Some very common patterns is to access messages headers.
>  - imap-reorg.png (attached) shows me opening my IMAP mailbox after a long weekend. We
can see that my MUA lists headers of the 108 messages received in the time laps. We can see
that, in order to retrieve the storage informations, the messagev3 table needs to be accessed
for each message, generating a huge count of PRIMARY KEY reads that are not strictly necessary,
and reading messageV3 yields second place in query time occupation.
>  - Similar things happens on top of JMAP. jmap-reorg.png shows 2 webmail email list loads.
Same things: For each message entry, we need to query messagev3 to retrieve storage informations
and being able to retireve headers. Here messagev3 reads yields first place, before the message
metadata reads, before the header reads.
> h3. The bit of Cassandra philosophy we might have missed...
> https://www.datastax.com/blog/basic-rules-cassandra-data-modeling
> {code:java}
> # Non-Goals
> ## Minimize the Number of Writes
> Writes in Cassandra aren't free, but they're awfully cheap. Cassandra is optimized for
high write throughput, and almost all writes are equally efficient [1].
> ## Minimize Data Duplication
> Denormalization and duplication of data is a fact of life with Cassandra. Don't be afraid
of it. [...] In order to get the most efficient reads, you often need to duplicate data.
> # Basic goals
> [...]
> ## Rule 2: Minimize the Number of Partitions Read
> [...] Furthermore, even on a single node, it's more expensive to read from multiple partitions
than from a single one due to the way rows are stored.
> {code}
> https://thelastpickle.com/blog/2017/03/16/compaction-nuance.html
> {code:java}
>  An incorrect data model can turn a single query into hundreds of queries, resulting
in increased latency, decreased throughput, and missed SLAs.
> {code}
> (This one is of an article about compaction but my feeling is that it is very relevant
to the situation I describe, so I could not refrain from quoting it...)
> h3. The new data-model
> I propose to do the following:
> {code:java}
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD internalDate timestamp ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD bodyStartOctet int  ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD fullContentOctets bigint  ;
> cqlsh:apache_james> ALTER TABLE messageIdTable ADD headerContent text  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD internalDate timestamp ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD bodyStartOctet int  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD fullContentOctets bigint  ;
> cqlsh:apache_james> ALTER TABLE imapUidTable ADD headerContent text  ;
> {code}
> That way we can easily  resolve METADATA and HEADERS FetchGroups against both messageIdTable
and imapUidTable, effectively limiting messageV3 reads to the FULL body reads.
> h3. Expectations
> This will effectively reduce the Cassandra query load for both JMAP and IMAP, effectively
speeding up James and allowing us to scale to larger workloads given the exact same infrastructure.
A boost ranging from 25% to 33% is expected for both IMAP, JMAP and POP3 workloads.
> h3. Migration strategy
>  - 1. The admin ALTER the tables
>  - 2. The admin deploys the new version of James. New written data is then fully denormalized...
>  - 3. But old written data still needs reads to messagev3 to be served (if expected data
is not in messageIdTable or in imapUidTable we know we need to read it from messagev3 table).
>  - 4. We propose a migration task that effectively look up messagev3 to populate newly
created rows for messageIdTable and imapUidTable - this way an admin can ensure to fully benefit
from the enhancement given previously existing data.
> I think the classical migration strategy is not a good fit for this one as:
>  - fallback mechanisms incurs performance degradations (double the amount of reads in
the transition period) and message metadata query speed is critical. With the proposed strategy
during the transition period at worst the previous behavior is applied.
>  - Creating and deleting tables is messy, when simple in-place modification do not generate
data model gardbage.
>  - We can add a startup-check to ensure the rows are correctly here (and abort startup
if not)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message