spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniil Osipov <daniil.osi...@shazam.com>
Subject Re: Spark Streaming with Kafka, building project with 'sbt assembly' is extremely slow
Date Tue, 02 Sep 2014 21:13:18 GMT
What version of sbt are you using? There is a bug in early version of 0.13
that causes assembly to be extremely slow - make sure you're using the
latest one.


On Fri, Aug 29, 2014 at 1:30 PM, Aris <arisofalaska@gmail.com> wrote:

> Hi folks,
>
> I am trying to use Kafka with Spark Streaming, and it appears I cannot do
> the normal 'sbt package' as I do with other Spark applications, such as
> Spark alone or Spark with MLlib. I learned I have to build with the
> sbt-assembly plugin.
>
> OK, so here is my build.sbt file for my extremely simple test Kafka/Spark
> Streaming project. It Takes almost 30 minutes to build! This is a Centos
> Linux machine on SSDs with 4GB of RAM, it's never been slow for me. To
> compare, sbt assembly for the entire Spark project itself takes less than
> 10 minutes.
>
> At the bottom of this file I am trying to play with 'cacheOutput' options,
> because I read online that maybe I am calculating SHA-1 for all the *.class
> files in this super JAR.
>
> I also copied the mergeStrategy from Spark contributor TD Spark Streaming
> tutorial from Spark Summit 2014.
>
> Again, is there some better way to build this JAR file, just using sbt
> package? This is process is working, but very slow.
>
> Any help with speeding up this compilation is really appreciated!!
>
> Aris
>
> -----------------------------------------
>
> import AssemblyKeys._ // put this at the top of the file
>
> name := "streamingKafka"
>
> version := "1.0"
>
> scalaVersion := "2.10.4"
>
> libraryDependencies ++= Seq(
>   "org.apache.spark" %% "spark-core" % "1.0.1" % "provided",
>   "org.apache.spark" %% "spark-streaming" % "1.0.1" % "provided",
>   "org.apache.spark" %% "spark-streaming-kafka" % "1.0.1"
> )
>
> assemblySettings
>
> jarName in assembly := "streamingkafka-assembly.jar"
>
> mergeStrategy in assembly := {
>   case m if m.toLowerCase.endsWith("manifest.mf")          =>
> MergeStrategy.discard
>   case m if m.toLowerCase.matches("meta-inf.*\\.sf$")      =>
> MergeStrategy.discard
>   case "log4j.properties"                                  =>
> MergeStrategy.discard
>   case m if m.toLowerCase.startsWith("meta-inf/services/") =>
> MergeStrategy.filterDistinctLines
>   case "reference.conf"                                    =>
> MergeStrategy.concat
>   case _                                                   =>
> MergeStrategy.first
> }
>
> assemblyOption in assembly ~= { _.copy(cacheOutput = false) }
>
>

Mime
View raw message