I guess he used client model and the local Spark version is 1.5.2 but the standalone Spark version is 1.5.1. In other words, he used a 1.5.2 driver to talk with 1.5.1 executors.

So I'm a little confused to exactly how this might have happened - but one quick guess is that maybe you've built an assembly jar with Spark core, can you mark it is a provided and or post your build file?

I logged SPARK-13084

For the moment, please consider running with 1.5.2 on all the nodes.

I agree with you, Ted, if RDD had a serial version UID this might not be an issue. So that could be a JIRA to submit to help avoid version mismatches in future Spark versions, but that doesn't help my current situation between 1.5.1 and 1.5.2.

Any other ideas? Thanks.
I am not Scala expert.

RDD extends Serializable but doesn't have @SerialVersionUID() annotation.
This may explain what you described.

One approach is to add @SerialVersionUID so that RDD's have stable serial version UID.


I've searched through the mailing list archive. It seems that if you try to run, for example, a Spark 1.5.2 program against a Spark 1.5.1 standalone server, you will run into an exception like this:

WARN  org.apache.spark.scheduler.TaskSetManager  - Lost task 0.0 in stage 0.0 (TID 0, java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class incompatible: stream classdesc serialVersionUID = -3343649307726848892, local class serialVersionUID = -3996494161745401652

If my application is using a library that builds against Spark 1.5.2, does that mean that my application is now tied to that same Spark standalone server version?

Is there a recommended way for that library to have a Spark dependency but keep it compatible against a wider set of versions, i.e. any version 1.5.x?