spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Cozzi <alexco...@gmail.com>
Subject excluding hadoop dependencies in spark's assembly files
Date Mon, 06 Jan 2014 22:33:12 GMT
I am trying to exclude the hadoop jar dependencies from spark’s assembly files, the reason
being that in order to work on our cluster it is necessary to use our now version of those
files instead of the published ones. I tried define the hadoop dependencies as “provided”,
but surpassingly this causes compilation errors in the build. Just to be clear, I modified
the sbt build file 
as follows:

  def yarnEnabledSettings = Seq(
    libraryDependencies ++= Seq(
      // Exclude rule required for all ?
      "org.apache.hadoop" % "hadoop-client" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
      "org.apache.hadoop" % "hadoop-yarn-api" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
      "org.apache.hadoop" % "hadoop-yarn-common" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib),
      "org.apache.hadoop" % "hadoop-yarn-client" % hadoopVersion  % "provided" excludeAll(excludeJackson,
excludeNetty, excludeAsm, excludeCglib)
    )
  )

and compile as 

 SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true SPARK_IS_NEW_HADOOP=true sbt  assembly


but the assembly still includes the hadoop libraries, contrary to what the assembly docs say.
I managed to exclude them instead by using the non-recommended way:
def extraAssemblySettings() = Seq(
    test in assembly := {},
    mergeStrategy in assembly := {
      case m if m.toLowerCase.endsWith("manifest.mf") => MergeStrategy.discard
      case m if m.toLowerCase.matches("meta-inf.*\\.sf$") => MergeStrategy.discard
      case "log4j.properties" => MergeStrategy.discard
      case m if m.toLowerCase.startsWith("meta-inf/services/") => MergeStrategy.filterDistinctLines
      case "reference.conf" => MergeStrategy.concat
      case _ => MergeStrategy.first
    },
    excludedJars in assembly <<= (fullClasspath in assembly) map { cp => 
     cp filter {_.data.getName.contains("hadoop")}
    }
)


But I would like to hear whether there is interest in excluding the hadoop jar by default
in the build
Alex Cozzi
alexcozzi@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message