Hello Stephen, 

My goal was to run spark on a cluster that already had spark and hadoop installed. So the right thing to do was to remove these dependencies in my spark build. I wrote a blog about it so that it might help. 

Here is the set of lines that changed my life
 
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.hadoop" % "hadoop-core" % "2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.0-incubating" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-mllib" % "0.9.0-incubating" % "provided"

HTH,
Shivani
PS: @Koert: I think what happened was that the akka that comes packaged with Spark was overriding the akka configuration parameters set by the akka dependencies. Since Spark comes with akka, one should probably not have akka dependencies explicitly specified


On Fri, May 2, 2014 at 5:21 AM, Koert Kuipers <koert@tresata.com> wrote:

not sure why applying concat to reference. conf didn't work for you. since it simply concatenates the files the key akka.version should be preserved. we had the same situation for a while without issues.

On May 1, 2014 8:46 PM, "Shivani Rao" <raoshivani@gmail.com> wrote:
Hello Koert,

That did not work. I specified it in my email already. But I figured a way around it  by excluding akka dependencies

Shivani


On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <koert@tresata.com> wrote:
you need to merge reference.conf files and its no longer an issue.

see the Build for for spark itself:
  case "reference.conf" => MergeStrategy.concat


On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <raoshivani@gmail.com> wrote:
Hello folks,

I was going to post this question to spark user group as well. If you have any leads on how to solve this issue please let me know:

I am trying to build a basic spark project (spark depends on akka) and I am trying to create a fatjar using sbt assembly. The goal is to run the fatjar via commandline as follows:
 java -cp "path to my spark fatjar" mainclassname

I encountered deduplication errors in the following akka libraries during sbt assembly
akka-remote_2.10-2.2.3.jar with akka-remote_2.10-2.2.3-shaded-protobuf.jar
 akka-actor_2.10-2.2.3.jar with akka-actor_2.10-2.2.3-shaded-protobuf.jar

I resolved them by using MergeStrategy.first and that helped with a successful compilation of the sbt assembly command. But for some or the other configuration parameter in the akka kept throwing up with the following message 

"Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key"

I then used MergeStrategy.concat for "reference.conf" and I started getting this repeated error  

Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'. 

I noticed that akka.version is only in the akka-actor jars and not in the akka-remote. The resulting reference.conf (in my final fat jar) does not contain akka.version either. So the strategy is not working. 

There are several things I could try

a) Use the following dependency https://github.com/sbt/sbt-proguard
b) Write a build.scala to handle merging of reference.conf


c) Create a reference.conf by merging all akka configurations and then passing it in my java -cp command as shown below

java -cp <jar-name> -DConfig.file=<config> 

The main issue is that if I run the spark jar as "sbt run" there are no errors in accessing any of the akka configuration parameters. It is only when I run it via command line (java -cp <jar-name> classname) that I encounter the error.

Which of these is a long term fix to akka issues? For now, I removed the akka dependencies and that solved the problem, but I know that is not a long term solution

Regards,
Shivani

--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA




--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA



--
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA