From dev-return-11882-apmail-spark-dev-archive=spark.apache.org@spark.apache.org Fri Mar 6 09:42:06 2015 Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 01E5D1094C for ; Fri, 6 Mar 2015 09:42:06 +0000 (UTC) Received: (qmail 60229 invoked by uid 500); 6 Mar 2015 09:41:42 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 60147 invoked by uid 500); 6 Mar 2015 09:41:42 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 60130 invoked by uid 99); 6 Mar 2015 09:41:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 09:41:42 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [157.193.49.126] (HELO smtp2.ugent.be) (157.193.49.126) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 09:41:16 +0000 Received: from localhost (mcheck2.ugent.be [157.193.49.249]) by smtp2.ugent.be (Postfix) with ESMTP id C881712C40C for ; Fri, 6 Mar 2015 10:41:14 +0100 (CET) X-Virus-Scanned: by UGent DICT Received: from smtp2.ugent.be ([IPv6:::ffff:157.193.49.126]) by localhost (mcheck2.UGent.be [::ffff:157.193.43.11]) (amavisd-new, port 10024) with ESMTP id 4zu5ysigmmqf for ; Fri, 6 Mar 2015 10:41:14 +0100 (CET) Received: from [157.193.44.241] (gast044a.ugent.be [157.193.44.241]) (Authenticated sender: ehiggs) by smtp2.ugent.be (Postfix) with ESMTPSA id 3C58912C408 for ; Fri, 6 Mar 2015 10:41:14 +0100 (CET) Message-ID: <54F9763A.6030503@ugent.be> Date: Fri, 06 Mar 2015 10:41:14 +0100 From: Ewan Higgs User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0 MIME-Version: 1.0 To: "dev@spark.apache.org" Subject: Fwd: SparkSpark-perf terasort WIP branch References: <54B67039.9070408@ugent.be> In-Reply-To: <54B67039.9070408@ugent.be> X-Forwarded-Message-Id: <54B67039.9070408@ugent.be> Content-Type: multipart/alternative; boundary="------------060206050102050607040008" X-Miltered: at jchkm3 with ID 54F9763A.000 by Joe's j-chkmail (http://helpdesk.ugent.be/email/)! X-j-chkmail-Enveloppe: 54F9763A.000 from gast044a.ugent.be/gast044a.ugent.be/157.193.44.241/[157.193.44.241]/ X-j-chkmail-Score: MSGID : 54F9763A.000 on smtp2.ugent.be : j-chkmail score : . : R=. U=. O=. B=0.000 -> S=0.000 X-j-chkmail-Status: Ham X-Virus-Checked: Checked by ClamAV on apache.org --------------060206050102050607040008 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi all, I never heard from anyone on this and have received emails in private that people would like to add terasort to their spark-perf installs so it becomes part of their cluster validation checks. Yours, Ewan -------- Forwarded Message -------- Subject: SparkSpark-perf terasort WIP branch Date: Wed, 14 Jan 2015 14:33:45 +0100 From: Ewan Higgs To: dev@spark.apache.org Hi all, I'm trying to build the Spark-perf WIP code but there are some errors to do with Hadoop APIs. I presume this is because there is some Hadoop version set and it's referring to that. But I can't seem to find it. The errors are as follows: [info] Compiling 15 Scala sources and 2 Java sources to /home/ehiggs/src/spark-perf/spark-tests/target/scala-2.10/classes... [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:40: object task is not a member of package org.apache.hadoop.mapreduce [error] import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl [error] ^ [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraInputFormat.scala:132: not found: type TaskAttemptContextImpl [error] val context = new TaskAttemptContextImpl( [error] ^ [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:37: object TTConfig is not a member of package org.apache.hadoop.mapreduce.server.tasktracker [error] import org.apache.hadoop.mapreduce.server.tasktracker.TTConfig [error] ^ [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraScheduler.scala:91: not found: value TTConfig [error] var slotsPerHost : Int = conf.getInt(TTConfig.TT_MAP_SLOTS, 4) [error] ^ [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:7: value run is not a member of org.apache.spark.examples.terasort.TeraGen [error] tg.run(Array[String]("10M", "/tmp/terasort_in")) [error] ^ [error] /home/ehiggs/src/spark-perf/spark-tests/src/main/scala/spark/perf/terasort/TeraSortAll.scala:9: value run is not a member of org.apache.spark.examples.terasort.TeraSort [error] ts.run(Array[String]("/tmp/terasort_in", "/tmp/terasort_out")) [error] ^ [error] 6 errors found [error] (compile:compile) Compilation failed [error] Total time: 13 s, completed 05-Jan-2015 12:21:47 I can build the same code if it's in the Spark tree using the following command: mvn -Dhadoop.version=2.5.0 -DskipTests=true install Is there a way I can convince spark-perf to build this code with the appropriate Hadoop library version? I tried to apply the following to spark-tests/project/SparkTestsBuild.scala but it didn't seem to work as I expected: $ git diff project/SparkTestsBuild.scala diff --git a/spark-tests/project/SparkTestsBuild.scala b/spark-tests/project/SparkTestsBuild.scala index 4116326..4ed5f0c 100644 --- a/spark-tests/project/SparkTestsBuild.scala +++ b/spark-tests/project/SparkTestsBuild.scala @@ -16,7 +16,9 @@ object SparkTestsBuild extends Build { "org.scalatest" %% "scalatest" % "2.2.1" % "test", "com.google.guava" % "guava" % "14.0.1", "org.apache.spark" %% "spark-core" % "1.0.0" % "provided", - "org.json4s" %% "json4s-native" % "3.2.9" + "org.json4s" %% "json4s-native" % "3.2.9", + "org.apache.hadoop" % "hadoop-common" % "2.5.0", + "org.apache.hadoop" % "hadoop-mapreduce" % "2.5.0" ), test in assembly := {}, outputPath in assembly := file("target/spark-perf-tests-assembly.jar"), @@ -36,4 +38,4 @@ object SparkTestsBuild extends Build { case _ => MergeStrategy.first } )) -} \ No newline at end of file +} Yours, Ewan --------------060206050102050607040008--