james-server-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benoit Tellier (Jira)" <server-...@james.apache.org>
Subject [jira] [Commented] (JAMES-3107) Log request when P99 is exceeded
Date Sun, 16 May 2021 04:29:00 GMT

    [ https://issues.apache.org/jira/browse/JAMES-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17345621#comment-17345621
] 

Benoit Tellier commented on JAMES-3107:
---------------------------------------

https://github.com/apache/james-project/pull/434 (includes detailed performance insights)

Using metrics to capture p99 lead to over-snapshoting, and incurs a 33% throughtput penalty
on JMAP draft.

Other SLOW logging implementations do rely on manual time
measurements (eg Datastax Cassandra driver).

As such, due to the low benefits and high costs I propose
to deprecate the logP99 API and migrate away from it.

Tools like Glowroot are able to capture slow traces at a
lighter cost and should rather be used.

> Log request when P99 is exceeded
> --------------------------------
>
>                 Key: JAMES-3107
>                 URL: https://issues.apache.org/jira/browse/JAMES-3107
>             Project: James Server
>          Issue Type: New Feature
>          Components: Metrics
>            Reporter: Benoit Tellier
>            Priority: Major
>             Fix For: 3.5.0
>
>
> Given our current tooling I struggle to correctly review slow requests from James.
> My current procedure is:
>   - In grafana identify timestamp of a spike
>   - Groke logs in kibana until I find something that could correspond
>   - Pray and hope my analisys stands.
> This is both time consumming, hard to do and unreliable.
> Identifying slow queries is important as it can point us to critical path to optimize.
> Hence I propose to log an info message when p99 is exceeded for high level function (JMAP
methods, IMAP processors, matcher mailet and overall processing, mailbox listeners, and remote
delivery).
> In order to avoid log spamming I propose to only log when a function-specified threshold
is exceeded (defaulting to 100ms)
> I belive it will help us coming up with more meaningful performance analysis and better
fixes for the greater goods of our prduction platforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: server-dev-unsubscribe@james.apache.org
For additional commands, e-mail: server-dev-help@james.apache.org


Mime
View raw message