spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shivani Rao <raoshiv...@gmail.com>
Subject Re: Spark: issues with running a sbt fat jar due to akka dependencies
Date Fri, 02 May 2014 18:35:32 GMT
Hello Stephen,

My goal was to run spark on a cluster that already had spark and hadoop
installed. So the right thing to do was to remove these dependencies in my
spark build. I wrote a
blog<http://myresearchdiaries.blogspot.com/2014/05/building-apache-spark-jars.html>
about
it so that it might help.

Here is the set of lines that changed my life

libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
"2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.hadoop" % "hadoop-core" %
"2.0.0-mr1-cdh4.4.0" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-core" %
"0.9.0-incubating" % "provided"

libraryDependencies += "org.apache.spark" %% "spark-mllib" %
"0.9.0-incubating" % "provided"

HTH,
Shivani
PS: @Koert: I think what happened was that the akka that comes packaged
with Spark was overriding the akka configuration parameters set by the akka
dependencies. Since Spark comes with akka, one should probably not have
akka dependencies explicitly specified


On Fri, May 2, 2014 at 5:21 AM, Koert Kuipers <koert@tresata.com> wrote:

> not sure why applying concat to reference. conf didn't work for you. since
> it simply concatenates the files the key akka.version should be preserved.
> we had the same situation for a while without issues.
>  On May 1, 2014 8:46 PM, "Shivani Rao" <raoshivani@gmail.com> wrote:
>
>> Hello Koert,
>>
>> That did not work. I specified it in my email already. But I figured a
>> way around it  by excluding akka dependencies
>>
>> Shivani
>>
>>
>> On Tue, Apr 29, 2014 at 12:37 PM, Koert Kuipers <koert@tresata.com>wrote:
>>
>>> you need to merge reference.conf files and its no longer an issue.
>>>
>>> see the Build for for spark itself:
>>>   case "reference.conf" => MergeStrategy.concat
>>>
>>>
>>> On Tue, Apr 29, 2014 at 3:32 PM, Shivani Rao <raoshivani@gmail.com>wrote:
>>>
>>>> Hello folks,
>>>>
>>>> I was going to post this question to spark user group as well. If you
>>>> have any leads on how to solve this issue please let me know:
>>>>
>>>> I am trying to build a basic spark project (spark depends on akka) and
>>>> I am trying to create a fatjar using sbt assembly. The goal is to run the
>>>> fatjar via commandline as follows:
>>>>  java -cp "path to my spark fatjar" mainclassname
>>>>
>>>> I encountered deduplication errors in the following akka libraries
>>>> during sbt assembly
>>>> akka-remote_2.10-2.2.3.jar with
>>>> akka-remote_2.10-2.2.3-shaded-protobuf.jar
>>>>  akka-actor_2.10-2.2.3.jar with
>>>> akka-actor_2.10-2.2.3-shaded-protobuf.jar
>>>>
>>>> I resolved them by using MergeStrategy.first and that helped with a
>>>> successful compilation of the sbt assembly command. But for some or the
>>>> other configuration parameter in the akka kept throwing up with the
>>>> following message
>>>>
>>>> "Exception in thread "main"
>>>> com.typesafe.config.ConfigException$Missing: No configuration setting found
>>>> for key"
>>>>
>>>> I then used MergeStrategy.concat for "reference.conf" and I started
>>>> getting this repeated error
>>>>
>>>> Exception in thread "main" com.typesafe.config.ConfigException$Missing:
>>>> No configuration setting found for key 'akka.version'.
>>>>
>>>> I noticed that akka.version is only in the akka-actor jars and not in
>>>> the akka-remote. The resulting reference.conf (in my final fat jar) does
>>>> not contain akka.version either. So the strategy is not working.
>>>>
>>>> There are several things I could try
>>>>
>>>> a) Use the following dependency https://github.com/sbt/sbt-proguard
>>>> b) Write a build.scala to handle merging of reference.conf
>>>>
>>>> https://spark-project.atlassian.net/browse/SPARK-395
>>>>
>>>> http://letitcrash.com/post/21025950392/howto-sbt-assembly-vs-reference-conf
>>>>
>>>> c) Create a reference.conf by merging all akka configurations and then
>>>> passing it in my java -cp command as shown below
>>>>
>>>> java -cp <jar-name> -DConfig.file=<config>
>>>>
>>>> The main issue is that if I run the spark jar as "sbt run" there are no
>>>> errors in accessing any of the akka configuration parameters. It is only
>>>> when I run it via command line (java -cp <jar-name> classname) that
I
>>>> encounter the error.
>>>>
>>>> Which of these is a long term fix to akka issues? For now, I removed
>>>> the akka dependencies and that solved the problem, but I know that is not
a
>>>> long term solution
>>>>
>>>> Regards,
>>>> Shivani
>>>>
>>>> --
>>>> Software Engineer
>>>> Analytics Engineering Team@ Box
>>>> Mountain View, CA
>>>>
>>>
>>>
>>
>>
>> --
>> Software Engineer
>> Analytics Engineering Team@ Box
>> Mountain View, CA
>>
>


-- 
Software Engineer
Analytics Engineering Team@ Box
Mountain View, CA

Mime
View raw message