Do we have any high availability support in Spark driver level? For example, if we want spark drive can move to another node continue execution when failure happen. I can see the RDD checkpoint can help to serialization the status of RDD. I can image to load the check point from another node when error happen, but seems like will lost track all tasks status or even executor information that maintain in spark context. I am not sure if there is any existing stuff I can leverage to do that. thanks for any suggests

Best Regards

Jun Feng Liu
IBM China Systems & Technology Laboratory in Beijing

2D barcode - encoded with contact information Phone: 86-10-82452683

BLD 28,ZGC Software Park
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193