I have a long running job which takes hours and hours to process data. How can i monitor the operational efficency of this job? I am interested in something like Storm\Flink style User metrics/aggregators, which i can monitor while my job is running. Using these metrics i want to monitor, per partition performance in processing items. As of now, only way for me to get these metrics is when the job finishes.
One possibility is that spark can flush the metrics to external system every few seconds, and thus use an external system to monitor these metrics. However, i wanted to see if the spark supports any such use case OOB.