spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anahita Talebi <anahita.t.am...@gmail.com>
Subject Re: Upgrade the scala code using the most updated Spark version
Date Tue, 28 Mar 2017 22:52:45 GMT
Thanks.
I tried this one, as well. Unfortunately I still get the same error.

On Wednesday, March 29, 2017, Marco Mistroni <mmistroni@gmail.com> wrote:

> 1.7.5
>
> On 28 Mar 2017 10:10 pm, "Anahita Talebi" <anahita.t.amiri@gmail.com
> <javascript:_e(%7B%7D,'cvml','anahita.t.amiri@gmail.com');>> wrote:
>
>> Hi,
>>
>> Thanks for your answer.
>> What is the version of "org.slf4j" % "slf4j-api" in your sbt file?
>> I think the problem might come from this part.
>>
>> On Tue, Mar 28, 2017 at 11:02 PM, Marco Mistroni <mmistroni@gmail.com
>> <javascript:_e(%7B%7D,'cvml','mmistroni@gmail.com');>> wrote:
>>
>>> Hello
>>>  uhm ihave a project whose build,sbt is closest to yours, where i am
>>> using spark 2.1, scala 2.11 and scalatest (i upgraded to 3.0.0) and it
>>> works fine
>>> in my projects though i don thave any of the following libraries that
>>> you mention
>>> - breeze
>>> - netlib,all
>>> -  scoopt
>>>
>>> hth
>>>
>>> On Tue, Mar 28, 2017 at 9:10 PM, Anahita Talebi <
>>> anahita.t.amiri@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','anahita.t.amiri@gmail.com');>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Thanks for your answer.
>>>>
>>>> I first changed the scala version to 2.11.8 and kept the spark version
>>>> 1.5.2 (old version). Then I changed the scalatest version into "3.0.1".
>>>> With this configuration, I could run the code and compile it and generate
>>>> the .jar file.
>>>>
>>>> When I changed the spark version into 2.1.0, I get the same error as
>>>> before. So I imagine the problem should be somehow related to the version
>>>> of spark.
>>>>
>>>> Cheers,
>>>> Anahita
>>>>
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> --------------------------------
>>>> import AssemblyKeys._
>>>>
>>>> assemblySettings
>>>>
>>>> name := "proxcocoa"
>>>>
>>>> version := "0.1"
>>>>
>>>> organization := "edu.berkeley.cs.amplab"
>>>>
>>>> scalaVersion := "2.11.8"
>>>>
>>>> parallelExecution in Test := false
>>>>
>>>> {
>>>>   val excludeHadoop = ExclusionRule(organization = "org.apache.hadoop")
>>>>   libraryDependencies ++= Seq(
>>>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>     "org.scalatest" %% "scalatest" % "3.0.1" % "test",
>>>>     "org.apache.spark" %% "spark-core" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.spark" %% "spark-mllib" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.spark" %% "spark-sql" % "2.1.0"
>>>> excludeAll(excludeHadoop),
>>>>     "org.apache.commons" % "commons-compress" % "1.7",
>>>>     "commons-io" % "commons-io" % "2.4",
>>>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>>   )
>>>> }
>>>>
>>>> {
>>>>   val defaultHadoopVersion = "1.0.4"
>>>>   val hadoopVersion =
>>>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>> defaultHadoopVersion)
>>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>> hadoopVersion
>>>> }
>>>>
>>>> libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.1.0"
>>>>
>>>> resolvers ++= Seq(
>>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>> ".m2/repository",
>>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases",
>>>>   "Spray" at "http://repo.spray.cc"
>>>> )
>>>>
>>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) { (old) =>
>>>>   {
>>>>     case PathList("javax", "servlet", xs @ _*)           =>
>>>> MergeStrategy.first
>>>>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>>> MergeStrategy.first
>>>>     case "application.conf"                              =>
>>>> MergeStrategy.concat
>>>>     case "reference.conf"                                =>
>>>> MergeStrategy.concat
>>>>     case "log4j.properties"                              =>
>>>> MergeStrategy.discard
>>>>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>>> MergeStrategy.discard
>>>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>>> MergeStrategy.discard
>>>>     case _ => MergeStrategy.first
>>>>   }
>>>> }
>>>>
>>>> test in assembly := {}
>>>> ------------------------------------------------------------
>>>> ------------------------------------------------------------
>>>> --------------------------------
>>>>
>>>> On Tue, Mar 28, 2017 at 9:33 PM, Marco Mistroni <mmistroni@gmail.com
>>>> <javascript:_e(%7B%7D,'cvml','mmistroni@gmail.com');>> wrote:
>>>>
>>>>> Hello
>>>>>  that looks to me like there's something dodgy withyour Scala
>>>>> installation
>>>>> Though Spark 2.0 is built on Scala 2.11, it still support 2.10... i
>>>>> suggest you change one thing at a time in your sbt
>>>>> First Spark version. run it and see if it works
>>>>> Then amend the scala version
>>>>>
>>>>> hth
>>>>>  marco
>>>>>
>>>>> On Tue, Mar 28, 2017 at 5:20 PM, Anahita Talebi <
>>>>> anahita.t.amiri@gmail.com
>>>>> <javascript:_e(%7B%7D,'cvml','anahita.t.amiri@gmail.com');>>
wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> Thanks you all for your informative answers.
>>>>>> I actually changed the scala version to the 2.11.8 and spark version
>>>>>> into 2.1.0 in the build.sbt
>>>>>>
>>>>>> Except for these two guys (scala and spark version), I kept the same
>>>>>> values for the rest in the build.sbt file.
>>>>>> ------------------------------------------------------------
>>>>>> ---------------
>>>>>> import AssemblyKeys._
>>>>>>
>>>>>> assemblySettings
>>>>>>
>>>>>> name := "proxcocoa"
>>>>>>
>>>>>> version := "0.1"
>>>>>>
>>>>>> scalaVersion := "2.11.8"
>>>>>>
>>>>>> parallelExecution in Test := false
>>>>>>
>>>>>> {
>>>>>>   val excludeHadoop = ExclusionRule(organization =
>>>>>> "org.apache.hadoop")
>>>>>>   libraryDependencies ++= Seq(
>>>>>>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>>>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>>>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>>>>     "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>>     "org.apache.spark" % "spark-mllib_2.11" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>>     "org.apache.spark" % "spark-sql_2.11" % "2.1.0"
>>>>>> excludeAll(excludeHadoop),
>>>>>>     "org.apache.commons" % "commons-compress" % "1.7",
>>>>>>     "commons-io" % "commons-io" % "2.4",
>>>>>>     "org.scalanlp" % "breeze_2.11" % "0.11.2",
>>>>>>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>>>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>>>>   )
>>>>>> }
>>>>>>
>>>>>> {
>>>>>>   val defaultHadoopVersion = "1.0.4"
>>>>>>   val hadoopVersion =
>>>>>>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>>>> defaultHadoopVersion)
>>>>>>   libraryDependencies += "org.apache.hadoop" % "hadoop-client" %
>>>>>> hadoopVersion
>>>>>> }
>>>>>>
>>>>>> libraryDependencies += "org.apache.spark" % "spark-streaming_2.11"
%
>>>>>> "2.1.0"
>>>>>>
>>>>>> resolvers ++= Seq(
>>>>>>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL +
>>>>>> ".m2/repository",
>>>>>>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases",
>>>>>>   "Spray" at "http://repo.spray.cc"
>>>>>> )
>>>>>>
>>>>>> mergeStrategy in assembly <<= (mergeStrategy in assembly) {
(old) =>
>>>>>>   {
>>>>>>     case PathList("javax", "servlet", xs @ _*)           =>
>>>>>> MergeStrategy.first
>>>>>>     case PathList(ps @ _*) if ps.last endsWith ".html"   =>
>>>>>> MergeStrategy.first
>>>>>>     case "application.conf"                              =>
>>>>>> MergeStrategy.concat
>>>>>>     case "reference.conf"                                =>
>>>>>> MergeStrategy.concat
>>>>>>     case "log4j.properties"                              =>
>>>>>> MergeStrategy.discard
>>>>>>     case m if m.toLowerCase.endsWith("manifest.mf")      =>
>>>>>> MergeStrategy.discard
>>>>>>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")  =>
>>>>>> MergeStrategy.discard
>>>>>>     case _ => MergeStrategy.first
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> test in assembly := {}
>>>>>> ----------------------------------------------------------------
>>>>>>
>>>>>> When I compile the code, I get the following error:
>>>>>>
>>>>>> [info] Compiling 4 Scala sources to /Users/atalebi/Desktop/new_ver
>>>>>> sion_proxcocoa-master/target/scala-2.11/classes...
>>>>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>>>>> /scala/utils/OptUtils.scala:40: value mapPartitionsWithSplit is not
>>>>>> a member of org.apache.spark.rdd.RDD[String]
>>>>>> [error]     val sizes = data.mapPartitionsWithSplit{ case(i,lines)
=>
>>>>>> [error]                      ^
>>>>>> [error] /Users/atalebi/Desktop/new_version_proxcocoa-master/src/main
>>>>>> /scala/utils/OptUtils.scala:41: value length is not a member of Any
>>>>>> [error]       Iterator(i -> lines.length)
>>>>>> [error]                           ^
>>>>>> ----------------------------------------------------------------
>>>>>> It gets the error in the code. Does it mean that for the different
>>>>>> version of the spark and scala, I need to change the main code?
>>>>>>
>>>>>> Thanks,
>>>>>> Anahita
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Mar 28, 2017 at 10:28 AM, Dinko Srkoč <dinko.srkoc@gmail.com
>>>>>> <javascript:_e(%7B%7D,'cvml','dinko.srkoc@gmail.com');>>
wrote:
>>>>>>
>>>>>>> Adding to advices given by others ... Spark 2.1.0 works with
Scala
>>>>>>> 2.11, so set:
>>>>>>>
>>>>>>>   scalaVersion := "2.11.8"
>>>>>>>
>>>>>>> When you see something like:
>>>>>>>
>>>>>>>   "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>>>>>
>>>>>>> that means that library `spark-core` is compiled against Scala
2.10,
>>>>>>> so you would have to change that to 2.11:
>>>>>>>
>>>>>>>   "org.apache.spark" % "spark-core_2.11" % "2.1.0"
>>>>>>>
>>>>>>> better yet, let SBT worry about libraries built against particular
>>>>>>> Scala versions:
>>>>>>>
>>>>>>>   "org.apache.spark" %% "spark-core" % "2.1.0"
>>>>>>>
>>>>>>> The `%%` will instruct SBT to choose the library appropriate
for a
>>>>>>> version of Scala that is set in `scalaVersion`.
>>>>>>>
>>>>>>> It may be worth mentioning that the `%%` thing works only with
Scala
>>>>>>> libraries as they are compiled against a certain Scala version.
Java
>>>>>>> libraries are unaffected (have nothing to do with Scala), e.g.
for
>>>>>>> `slf4j` one only uses single `%`s:
>>>>>>>
>>>>>>>   "org.slf4j" % "slf4j-api" % "1.7.2"
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Dinko
>>>>>>>
>>>>>>> On 27 March 2017 at 23:30, Mich Talebzadeh <
>>>>>>> mich.talebzadeh@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','mich.talebzadeh@gmail.com');>>
wrote:
>>>>>>> > check these versions
>>>>>>> >
>>>>>>> > function create_build_sbt_file {
>>>>>>> >         BUILD_SBT_FILE=${GEN_APPSDIR}/scala/${APPLICATION}/
>>>>>>> build.sbt
>>>>>>> >         [ -f ${BUILD_SBT_FILE} ] && rm -f ${BUILD_SBT_FILE}
>>>>>>> >         cat >> $BUILD_SBT_FILE << !
>>>>>>> > lazy val root = (project in file(".")).
>>>>>>> >   settings(
>>>>>>> >     name := "${APPLICATION}",
>>>>>>> >     version := "1.0",
>>>>>>> >     scalaVersion := "2.11.8",
>>>>>>> >     mainClass in Compile := Some("myPackage.${APPLICATION}")
>>>>>>> >   )
>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-core"
%
>>>>>>> "2.0.0" %
>>>>>>> > "provided"
>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-sql"
% "2.0.0"
>>>>>>> %
>>>>>>> > "provided"
>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-hive"
%
>>>>>>> "2.0.0" %
>>>>>>> > "provided"
>>>>>>> > libraryDependencies += "org.apache.spark" %% "spark-streaming"
%
>>>>>>> "2.0.0" %
>>>>>>> > "provided"
>>>>>>> > libraryDependencies += "org.apache.spark" %%
>>>>>>> "spark-streaming-kafka" %
>>>>>>> > "1.6.1" % "provided"
>>>>>>> > libraryDependencies += "com.google.code.gson" % "gson" %
"2.6.2"
>>>>>>> > libraryDependencies += "org.apache.phoenix" % "phoenix-spark"
%
>>>>>>> > "4.6.0-HBase-1.0"
>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase" % "1.2.3"
>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-client"
%
>>>>>>> "1.2.3"
>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-common"
%
>>>>>>> "1.2.3"
>>>>>>> > libraryDependencies += "org.apache.hbase" % "hbase-server"
%
>>>>>>> "1.2.3"
>>>>>>> > // META-INF discarding
>>>>>>> > mergeStrategy in assembly <<= (mergeStrategy in assembly)
{ (old)
>>>>>>> =>
>>>>>>> >    {
>>>>>>> >     case PathList("META-INF", xs @ _*) => MergeStrategy.discard
>>>>>>> >     case x => MergeStrategy.first
>>>>>>> >    }
>>>>>>> > }
>>>>>>> > !
>>>>>>> > }
>>>>>>> >
>>>>>>> > HTH
>>>>>>> >
>>>>>>> > Dr Mich Talebzadeh
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > LinkedIn
>>>>>>> > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJ
>>>>>>> d6zP6AcPCCdOABUrV8Pw
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > http://talebzadehmich.wordpress.com
>>>>>>> >
>>>>>>> >
>>>>>>> > Disclaimer: Use it at your own risk. Any and all responsibility
>>>>>>> for any
>>>>>>> > loss, damage or destruction of data or any other property
which
>>>>>>> may arise
>>>>>>> > from relying on this email's technical content is explicitly
>>>>>>> disclaimed. The
>>>>>>> > author will in no case be liable for any monetary damages
arising
>>>>>>> from such
>>>>>>> > loss, damage or destruction.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > On 27 March 2017 at 21:45, Jörn Franke <jornfranke@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','jornfranke@gmail.com');>>
wrote:
>>>>>>> >>
>>>>>>> >> Usually you define the dependencies to the Spark library
as
>>>>>>> provided. You
>>>>>>> >> also seem to mix different Spark versions which should
be avoided.
>>>>>>> >> The Hadoop library seems to be outdated and should also
only be
>>>>>>> provided.
>>>>>>> >>
>>>>>>> >> The other dependencies you could assemble in a fat jar.
>>>>>>> >>
>>>>>>> >> On 27 Mar 2017, at 21:25, Anahita Talebi <
>>>>>>> anahita.t.amiri@gmail.com
>>>>>>> <javascript:_e(%7B%7D,'cvml','anahita.t.amiri@gmail.com');>>
>>>>>>> >> wrote:
>>>>>>> >>
>>>>>>> >> Hi friends,
>>>>>>> >>
>>>>>>> >> I have a code which is written in Scala. The scala version
2.10.4
>>>>>>> and
>>>>>>> >> Spark version 1.5.2 are used to run the code.
>>>>>>> >>
>>>>>>> >> I would like to upgrade the code to the most updated
version of
>>>>>>> spark,
>>>>>>> >> meaning 2.1.0.
>>>>>>> >>
>>>>>>> >> Here is the build.sbt:
>>>>>>> >>
>>>>>>> >> import AssemblyKeys._
>>>>>>> >>
>>>>>>> >> assemblySettings
>>>>>>> >>
>>>>>>> >> name := "proxcocoa"
>>>>>>> >>
>>>>>>> >> version := "0.1"
>>>>>>> >>
>>>>>>> >> scalaVersion := "2.10.4"
>>>>>>> >>
>>>>>>> >> parallelExecution in Test := false
>>>>>>> >>
>>>>>>> >> {
>>>>>>> >>   val excludeHadoop = ExclusionRule(organization =
>>>>>>> "org.apache.hadoop")
>>>>>>> >>   libraryDependencies ++= Seq(
>>>>>>> >>     "org.slf4j" % "slf4j-api" % "1.7.2",
>>>>>>> >>     "org.slf4j" % "slf4j-log4j12" % "1.7.2",
>>>>>>> >>     "org.scalatest" %% "scalatest" % "1.9.1" % "test",
>>>>>>> >>     "org.apache.spark" % "spark-core_2.10" % "1.5.2"
>>>>>>> >> excludeAll(excludeHadoop),
>>>>>>> >>     "org.apache.spark" % "spark-mllib_2.10" % "1.5.2"
>>>>>>> >> excludeAll(excludeHadoop),
>>>>>>> >>     "org.apache.spark" % "spark-sql_2.10" % "1.5.2"
>>>>>>> >> excludeAll(excludeHadoop),
>>>>>>> >>     "org.apache.commons" % "commons-compress" % "1.7",
>>>>>>> >>     "commons-io" % "commons-io" % "2.4",
>>>>>>> >>     "org.scalanlp" % "breeze_2.10" % "0.11.2",
>>>>>>> >>     "com.github.fommil.netlib" % "all" % "1.1.2" pomOnly(),
>>>>>>> >>     "com.github.scopt" %% "scopt" % "3.3.0"
>>>>>>> >>   )
>>>>>>> >> }
>>>>>>> >>
>>>>>>> >> {
>>>>>>> >>   val defaultHadoopVersion = "1.0.4"
>>>>>>> >>   val hadoopVersion =
>>>>>>> >>     scala.util.Properties.envOrElse("SPARK_HADOOP_VERSION",
>>>>>>> >> defaultHadoopVersion)
>>>>>>> >>   libraryDependencies += "org.apache.hadoop" % "hadoop-client"
%
>>>>>>> >> hadoopVersion
>>>>>>> >> }
>>>>>>> >>
>>>>>>> >> libraryDependencies += "org.apache.spark" %
>>>>>>> "spark-streaming_2.10" %
>>>>>>> >> "1.5.0"
>>>>>>> >>
>>>>>>> >> resolvers ++= Seq(
>>>>>>> >>   "Local Maven Repository" at Path.userHome.asFile.toURI.toURL
+
>>>>>>> >> ".m2/repository",
>>>>>>> >>   "Typesafe" at "http://repo.typesafe.com/typesafe/releases",
>>>>>>> >>   "Spray" at "http://repo.spray.cc"
>>>>>>> >> )
>>>>>>> >>
>>>>>>> >> mergeStrategy in assembly <<= (mergeStrategy in
assembly) { (old)
>>>>>>> =>
>>>>>>> >>   {
>>>>>>> >>     case PathList("javax", "servlet", xs @ _*)     
     =>
>>>>>>> >> MergeStrategy.first
>>>>>>> >>     case PathList(ps @ _*) if ps.last endsWith ".html"
  =>
>>>>>>> >> MergeStrategy.first
>>>>>>> >>     case "application.conf"                        
     =>
>>>>>>> >> MergeStrategy.concat
>>>>>>> >>     case "reference.conf"                          
     =>
>>>>>>> >> MergeStrategy.concat
>>>>>>> >>     case "log4j.properties"                        
     =>
>>>>>>> >> MergeStrategy.discard
>>>>>>> >>     case m if m.toLowerCase.endsWith("manifest.mf")
     =>
>>>>>>> >> MergeStrategy.discard
>>>>>>> >>     case m if m.toLowerCase.matches("meta-inf.*\\.sf$")
 =>
>>>>>>> >> MergeStrategy.discard
>>>>>>> >>     case _ => MergeStrategy.first
>>>>>>> >>   }
>>>>>>> >> }
>>>>>>> >>
>>>>>>> >> test in assembly := {}
>>>>>>> >>
>>>>>>> >> -----------------------------------------------------------
>>>>>>> >> I downloaded the spark 2.1.0 and change the version
of spark and
>>>>>>> >> scalaversion in the build.sbt. But unfortunately, I
was failed to
>>>>>>> run the
>>>>>>> >> code.
>>>>>>> >>
>>>>>>> >> Does anybody know how I can upgrade the code to the
most recent
>>>>>>> spark
>>>>>>> >> version by changing the build.sbt file?
>>>>>>> >>
>>>>>>> >> Or do you have any other suggestion?
>>>>>>> >>
>>>>>>> >> Thanks a lot,
>>>>>>> >> Anahita
>>>>>>> >>
>>>>>>> >
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Mime
View raw message