samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Shivanna <abks...@gmail.com>
Subject [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers
Date Tue, 25 Apr 2017 01:42:05 GMT
Hi Everyone,

In order to fix the issue of orphaned/leaky containers seen when the
YARN Node Manager crashes, I have created a SEP discussing the design for
implementing a heartbeat between the containers and the job coordinator:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-3%3A+Heart-beat+mechanism+between+JobCoordinator+and+all+running+containers

Please take a look and provide feedback. I would also really appreciate
help in designing a way to propagate the error up from SamzaContainer in
order to exit the container with a non-zero exit code.

Thanks,
Abhishek

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message