spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brandon Geise <brandonge...@gmail.com>
Subject Re: How to address seemingly low core utilization on a spark workload?
Date Thu, 15 Nov 2018 15:27:00 GMT
I recently came across this (haven’t tried it out yet) but maybe it can help guide you to
identify the root cause.

https://github.com/groupon/sparklint

 

 

From: Vitaliy Pisarev <vitaliy.pisarev@biocatch.com>
Date: Thursday, November 15, 2018 at 10:08 AM
To: user <user@spark.apache.org>
Cc: David Markovitz <Dudu.Markovitz@microsoft.com>
Subject: How to address seemingly low core utilization on a spark workload?

 

I have a workload that runs on a cluster of 300 cores. 

Below is a plot of the amount of active tasks over time during the execution of this workload:

 

 

What I deduce is that there are substantial intervals where the cores are heavily under-utilised.


 

What actions can I take to:

Increase the efficiency (== core utilisation) of the cluster?
Understand the root causes behind the drops in core utilisation?


Mime
View raw message