whimsical-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sam Ruby <ru...@apache.org>
Subject [whimsy.git] [1/1] Commit 29242b5: save first draft
Date Sun, 17 Jan 2016 23:19:48 GMT
Commit 29242b5c5cbcaadb12598d899be61fa2f5b1a38f:
    save first draft


Branch: refs/heads/master
Author: Sam Ruby <rubys@intertwingly.net>
Committer: Sam Ruby <rubys@intertwingly.net>
Pusher: rubys <rubys@apache.org>

------------------------------------------------------------
www/status/README.md                                         | +++++++++++++ 
------------------------------------------------------------
98 changes: 98 additions, 0 deletions.
------------------------------------------------------------


diff --git a/www/status/README.md b/www/status/README.md
new file mode 100644
index 0000000..3364673
--- /dev/null
+++ b/www/status/README.md
@@ -0,0 +1,98 @@
+Monitoring
+==========
+
+The state of whimsy is represented as a tree of named nodes.
+
+Nodes, names, and strings
+-------------------------
+
+Each major branch is produced by a [monitor](monitors).  Each monitor can return a tree of
nodes,
+or a single String, or an array of Strings.  The name of the monitor is used as the name
of the
+node produced.
+
+Leaf nodes consist of a String, an array of Strings, or a Hash where one element in the Hash
has
+a key of `data` with a value of either a String or an array of Strings.
+
+Non-leaf nodes consist of a Hash where one element in the Hash has a key of `data` with a
value
+that is a Hash of names and child nodes.
+
+Levels
+------
+
+Each node is associated with a status *level*.  Valid levels are `success`, `info`, `warning`
+and `danger`.  (These levels are modelled after Bootstrap [alerts](http://getbootstrap.com/components/#alerts)).
+
+Default level for valid leaf nodes is `success`.  Invalid leaf nodes (e.g., a node consisting
+of a `nil` value) have a level of `danger`.  Only leaf nodes that in the form of a Hash can
+have levels.  Leaf nodes that are not Hashes will be normalized into a Hash with a `level`
and
+`data`.
+
+Default level for non-leaf nodes is the highest level in children nodes (where `danger` >
`warning`,
+`warning` > `info` and `info` > `success`).  Normally monitors will not assign level
values for
+non-leaf nodes.
+
+Titles
+------
+
+Non-leaf nodes have a *title* describing the contents of the children.  Titles show up as
tooltips in
+the browser.
+
+Default for title is either a list or a count of the names of child nodes with the highest
status.
+Again, normally monitors will not assign title values for nodes.
+
+Text
+----
+
+Somewhat rare, but a node may have *text* which is used in place of the name of the node
for
+display purposes (the name continues to be used to produce the anchor id for the element
for
+linking purposes).
+
+Internally, exceptions returned by a monitor are converted to a leaf node with a name of
+`exception`, a title containing the exception, and data consisting of a stack traceback.

+
+Href
+----
+
+Leaf nodes may have a *href* which will be used as the target for the link used to display
+the contents of the leaf node (either a single String or an array of Strings).
+
+Mtime
+-----
+
+Anchors and the top of each major branch emanating from the root have an mtime which indicates
+when that data was last updated.  This is described below in the control flow section below.
+
+Control Flow
+============
+
+Monitors are simple class methods.  Monitors are called no more often than once a minute,
+and are passed the normalized results of the previous call.
+
+Monitors are called in response to a ping, so should produce results in sub-second time
+in order to avoid the ping timing out.  (Currently monitors are called in series, but
+in the future this may change to threads so that statuses may be collected in parallel.)
+Monitors that perform activies that take a substantial amount of time may elect to do so
+less frequently than once a minute, and can take advange of the `mtime` values to do
+so.
+
+Results are collected into a hash, and that hash is then normalized.  Normalization resolves
+default values for items like levels and titles recursively.
+
+The normalized status is written to disk as [status.json](status.json), and used as a
+response to pings that occur less than a minute after the previous s tatus.
+
+Alerts
+======
+
+The Apache Software Foundation infrastructure team uses
+[Ping My Box](https://www.pingmybox.com/dashboard?location=470) to monitor status.
+A dozen+ servers around the world check status ever 5 minutes or so, and will
+report failure results to the infrastructure [HipChat](http://infra.chat/)
+channel, and may in the future be configured to send pager alerts.
+
+While the full status for whimsy is represented as a tree of nodes, each
+assigned one of our levels, and containing either child nodes or one or
+more strings, all the infrastructure team is currently concerned with
+is a boolean status (`success` and `info` are treated as success, and
+`warning` and `danger` are treated as failure) and the computed title
+for the root node.

Mime
View raw message