Still no luck running purpose-built 1.3 against HDP 2.2 after following all the instructions. Anyone else faced this issue?

On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar <reachbach@gmail.com> wrote:
Hi Todd,

Thanks for the help. I'll try again after building a distribution with the 1.3 sources. However, I wanted to confirm what I mentioned earlier:  is it sufficient to copy the distribution only to the client host from where  spark-submit is invoked(with spark.yarn.jar set), or is there a need to ensure that the entire distribution is deployed made available pre-deployed on every host in the yarn cluster? I'd assume that the latter shouldn't be necessary.

On Mon, Mar 16, 2015 at 8:38 PM, Todd Nist <tsindotg@gmail.com> wrote:
Hi Bharath,

I ran into the same issue a few days ago, here is a link to a post on Horton's fourm.  http://hortonworks.com/community/forums/search/spark+1.2.1/

Incase anyone else needs to perform this these are the steps I took to get it to work with Spark 1.2.1 as well as Spark 1.3.0-RC3:

1. Pull 1.2.1 Source
2. Apply the following patches
a. Address jackson version, https://github.com/apache/spark/pull/3938
b. Address the propagation of the hdp.version set in the spark-default.conf, https://github.com/apache/spark/pull/3409
3. build with $SPARK_HOME./make-distribution.sh –name hadoop2.6 –tgz -Pyarn -Phadoop-2.4 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests package

Then deploy the resulting artifact => spark-1.2.1-bin-hadoop2.6.tgz following instructions in the HDP Spark preview http://hortonworks.com/hadoop-tutorial/using-apache-spark-hdp/

FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a and 2b are not required.

HTH

-Todd


On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar <reachbach@gmail.com> wrote:
Hi,

Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results in the AM failing to start with following error on stderr:
Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher
An application id was assigned to the job, but there were no logs. Note that the spark distribution has not been "installed" on every host in the cluster and the aforementioned spark build was copied to one of the hadoop client hosts in the cluster to launch the
job. S
park-submit was run with --master yarn-client and spark.yarn.jar was set to the assembly jar from the above distribution. Switching the spark distribution to the HDP recommended version
and following the instructions on this page did not fix the problem either. Any idea what may have caused this error ?

Thanks,
Bharath