james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tellier Benoit (JIRA)" <server-...@james.apache.org>
Subject [jira] [Closed] (JAMES-2390) JMAP attachment performance issues
Date Tue, 08 May 2018 02:20:00 GMT

     [ https://issues.apache.org/jira/browse/JAMES-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Tellier Benoit closed JAMES-2390.
---------------------------------

> JMAP attachment performance issues
> ----------------------------------
>
>                 Key: JAMES-2390
>                 URL: https://issues.apache.org/jira/browse/JAMES-2390
>             Project: James Server
>          Issue Type: New Feature
>          Components: cassandra, JMAP
>    Affects Versions: master
>            Reporter: Tellier Benoit
>            Assignee: Antoine Duprat
>            Priority: Major
>              Labels: perfomance
>         Attachments: Capture d’écran de 2018-05-06 19-32-31.png, Capture d’écran
de 2018-05-06 19-35-02.png
>
>
> Most of the Cassandra failures are related to attachment downloads, and more precisely
to attachment right checking.
> Having a look at attached screenshots:
>  - We can notice a lot of warnings are generated by JMAP attachment downloads.
>  - That failure happens when reading meta-data, in order to retrieve the list of referencing
messages to resolve rights.
>  - Furthermore, we can notice failure is systematic for some attachments.
> I spend a bit of time this weekend analysing this (unexpected!) performance issues. I've
mostly found 2 intuitive performance improvements as well as one more complex.
>  -1. Upon checking whether a set of messages is accessible, the containing mailbox rights
were checks on a per-mailbox base. This is sub-optimal as some messages might be in the same
mailbox, whose rights will be needlessly checked several times.
> This change inserts smoothly into the codebase, the tools for checking rights once per
mailbox is already implemented. Just not used in that case.
>  - 2. Paging and asynchronous code don't combine well as already proven by previous code.
The mantra is *join then collect*. If the operation is done reverse and entries exceed paging
size (~5000) an exception will be thrown by the Cassandra driver.
> This explains the systematic failures for some specific attachments... The fix is trivial,
and I added a test for demonstrating this.
>  - 3. The given logs suggest that we have high cardinality rows in our database (IE an
attachment referenced by several messages), as the number of referencing messages exceeds
5000 (to trigger paging issues)
> Such a high cardinality has a massive read cost:
>  - Reading such a row is a complex operation
>  - Caching can not help as cache size per primary key is exceeded
>  - Rights would be resolved for each referencing messages, generating an expensive read
Cascade.
> Note that deduplication is done at the Attachment level. By looking at the attachment
names (cf screenshots) we can notice these "high cardinality" attachments look like inlined
images in signature...
> The stand here is that deduplicating is not a concern for attachments, but for blobs.
We should further push this lower level constraint in the stack. That way, each blob would
be deduplicated (storage cost reduction, higher FS cache efficiency, etc...) while avoiding
*wide rows*.
> We should ensure each newly generated AttachmentId is unique, then generate BlobId from
the blob's content, to avoid wide rows while keeping deduplication in place.
> Note that this being done just for newly received messages, this can be done transparently,
without the needs for a migration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message