spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Darabos <>
Subject Task duration graph on Spark stage UI
Date Thu, 06 Nov 2014 17:04:40 GMT
Even though the stage UI has min, 25th%, median, 75th%, and max durations,
I am often still left clueless about the distribution. For example, 100 out
of 200 tasks (started at the same time) have completed in 1 hour. How much
longer do I have to wait? I cannot guess well based on the five numbers.

A graph of the durations will not answer the question either, but I think
it gives a better idea. I can hopefully see if the distribution is linearly
sloped or bimodal or exponentially slowing down, etc.

It's easy to draw this graph, so I set it up as a Chrome extension:

And here's the complete source code that you can throw in the JavaScript
console for the same results:

var x = $('table:eq(2)').find('td:nth-child(8)').map(function (i, e) {
return parseInt($(e).attr('sorttable_customkey')); });
x.sort(function(a, b) { return a - b; });
var w = x.length;
var h = x[w - 1];
var W = 180;
var H = 80;
var canvas = $('<canvas width="' + W + '" height="' + H + '">');
canvas.css({ position: 'absolute', top: '100px', left: '500px' });
var ctx = canvas[0].getContext('2d');
ctx.fillStyle = 'orange';
ctx.moveTo(0, H);
for (var i = 0; i < w; ++i) {
  ctx.lineTo(i * W / (w - 1), H - x[i] * H / h);
ctx.lineTo(W, H);

It should not be much work to add this to the stage status page itself
either, if there is interest.

View raw message