spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandeep Singh ...@chandeep.com>
Subject Re: Error building a self contained Spark app
Date Sat, 05 Mar 2016 00:46:35 GMT
#3 If your code is dependent on other projects you will need to package everything together
in order to distribute over a Spark cluster.

In your example below I don’t see much of an advantage by building a package.

> On Mar 5, 2016, at 12:32 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> 
> Answers to first two questions are 'yes'
> 
> Not clear on what the 3rd question is asking.
> 
> On Fri, Mar 4, 2016 at 4:28 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
> Thanks now all working. Also select from  tmp tables are part of sqlContext not HiveContext
> 
> This is the final code that works in blue
> 
> 
> Couple of questions if I may
> 
> This works pretty effortless in spark-shell. Is this because $CLASSPATH already includes
all the needed jars?
> The import section. That imports the needed classes. So basically import org.apache.spark.sql.functions._
imports all the methods of Class functions?
> What is the reason why we should use sbt to build custom jars from a spark code as opposed
to running the code against spark shell in a file? Any particular use case for it?
> 
> import org.apache.spark.SparkContext
> import org.apache.spark.SparkConf
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.hive.HiveContext
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.SQLContext
> import org.apache.spark.sql.functions._
> //
> object Sequence {
>   def main(args: Array[String]) {
>   val conf = new SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
"true")
>   val sc = new SparkContext(conf)
>   // Note that this should be done only after an instance of org.apache.spark.sql.SQLContext
is created. It should be written as:
>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>   import sqlContext.implicits._
>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>   val a = Seq(("Mich",20), ("Christian", 18), ("James",13), ("Richard",16))
>   // Sort option 1 using tempTable
>   val b = a.toDF("Name","score").registerTempTable("tmp")
>   sqlContext.sql("select Name,score from tmp order by score desc").show
>   // Sort option 2 with FP
>   a.toDF("Name","score").sort(desc("score")).show
>  }
> }
> 
> Dr Mich Talebzadeh
>  
> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> 
> On 4 March 2016 at 23:58, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
> That is because an instance of org.apache.spark.sql.SQLContext doesn’t exist in the
current context and is required before you can use any of its implicit methods.
> 
> As Ted mentioned importing org.apache.spark.sql.functions._ will take care of the below
error.
> 
> 
>> On Mar 4, 2016, at 11:35 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>> 
>> thanks. It is like war of attrition. I always thought that you add  import before
the class itself not within the class? w3hat is the reason for it please?
>> 
>> this is my code
>> 
>> import org.apache.spark.SparkContext
>> import org.apache.spark.SparkConf
>> import org.apache.spark.sql.Row
>> import org.apache.spark.sql.hive.HiveContext
>> import org.apache.spark.sql.types._
>> import org.apache.spark.sql.SQLContext
>> //
>> object Sequence {
>>   def main(args: Array[String]) {
>>   val conf = new SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
"true")
>>   val sc = new SparkContext(conf)
>>   // Note that this should be done only after an instance of org.apache.spark.sql.SQLContext
is created. It should be written as:
>>   val sqlContext= new org.apache.spark.sql.SQLContext(sc)
>>   import sqlContext.implicits._
>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>   val a = Seq(("Mich",20), ("Christian", 18), ("James",13), ("Richard",16))
>>   // Sort option 1 using tempTable
>>   val b = a.toDF("Name","score").registerTempTable("tmp")
>>   HiveContext.sql("select Name,score from tmp order by score desc").show
>>   // Sort option 2 with FP
>>   a.toDF("Name","score").sort(desc("score")).show
>>  }
>> }
>> 
>> And now the last failure is in
>> 
>> info]  [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (104ms)
>> [info] Done updating.
>> [info] Compiling 1 Scala source to /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes...
>> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. Compiling...
>> [info]   Compilation completed in 15.779 s
>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:21: not
found: value desc
>> [error]   a.toDF("Name","score").sort(desc("score")).show
>> [error]                               ^
>> [error] one error found
>> [error] (compile:compileIncremental) Compilation failed
>> 
>> 
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> 
>> On 4 March 2016 at 23:25, Chandeep Singh <cs@chandeep.com <mailto:cs@chandeep.com>>
wrote:
>> This is what you need:
>> 
>>     val sc = new SparkContext(sparkConf)
>>     val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>     import sqlContext.implicits._
>> 
>>> On Mar 4, 2016, at 11:03 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com <mailto:mich.talebzadeh@gmail.com>>
wrote:
>>> 
>>> Hi Ted,
>>> 
>>> This is my code
>>> 
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.sql.Row
>>> import org.apache.spark.sql.hive.HiveContext
>>> import org.apache.spark.sql.types._
>>> import org.apache.spark.sql.SQLContext
>>> //
>>> object Sequence {
>>>   def main(args: Array[String]) {
>>>   val conf = new SparkConf().setAppName("Sequence").setMaster("local[*]").set("spark.driver.allowMultipleContexts",
"true")
>>>   val sc = new SparkContext(conf)
>>>   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>   val a = Seq(("Mich",20), ("Christian", 18), ("James",13), ("Richard",16))
>>>   // Sort option 1 using tempTable
>>>   val b = a.toDF("Name","score").registerTempTable("tmp")
>>>   sql("select Name,score from tmp order by score desc").show
>>>   // Sort option 2 with FP
>>>   a.toDF("Name","score").sort(desc("score")).show
>>>  }
>>> }
>>> 
>>> And the error I am getting now is
>>> 
>>> [info] downloading https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.5/jline-2.10.5.jar
<https://repo1.maven.org/maven2/org/scala-lang/jline/2.10.5/jline-2.10.5.jar> ...
>>> [info]  [SUCCESSFUL ] org.scala-lang#jline;2.10.5!jline.jar (103ms)
>>> [info] Done updating.
>>> [info] Compiling 1 Scala source to /home/hduser/dba/bin/scala/Sequence/target/scala-2.10/classes...
>>> [info] 'compiler-interface' not yet compiled for Scala 2.10.5. Compiling...
>>> [info]   Compilation completed in 12.462 s
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16:
value toDF is not a member of Seq[(String, Int)]
>>> [error]   val b = a.toDF("Name","score").registerTempTable("tmp")
>>> [error]             ^
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:17:
not found: value sql
>>> [error]   sql("select Name,score from tmp order by score desc").show
>>> [error]   ^
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:19:
value toDF is not a member of Seq[(String, Int)]
>>> [error]   a.toDF("Name","score").sort(desc("score")).show
>>> [error]     ^
>>> [error] three errors found
>>> [error] (compile:compileIncremental) Compilation failed
>>> [error] Total time: 88 s, completed Mar 4, 2016 11:12:46 PM
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> On 4 March 2016 at 22:52, Ted Yu <yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>
wrote:
>>> Can you show your code snippet ?
>>> Here is an example:
>>> 
>>>       val sqlContext = new SQLContext(sc)
>>>       import sqlContext.implicits._
>>> 
>>> On Fri, Mar 4, 2016 at 1:55 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
<mailto:mich.talebzadeh@gmail.com>> wrote:
>>> Hi Ted,
>>> 
>>>  I am getting the following error after adding that import
>>> 
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:5:
not found: object sqlContext
>>> [error] import sqlContext.implicits._
>>> [error]        ^
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15:
value toDF is not a member of Seq[(String, Int)]
>>> 
>>> 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> On 4 March 2016 at 21:39, Ted Yu <yuzhihong@gmail.com <mailto:yuzhihong@gmail.com>>
wrote:
>>> Can you add the following into your code ?
>>>  import sqlContext.implicits._
>>> 
>>> On Fri, Mar 4, 2016 at 1:14 PM, Mich Talebzadeh <mich.talebzadeh@gmail.com
<mailto:mich.talebzadeh@gmail.com>> wrote:
>>> Hi,
>>> 
>>> I have a simple Scala program as below
>>> 
>>> import org.apache.spark.SparkContext
>>> import org.apache.spark.SparkContext._
>>> import org.apache.spark.SparkConf
>>> import org.apache.spark.sql.SQLContext
>>> object Sequence {
>>>   def main(args: Array[String]) {
>>>   val conf = new SparkConf().setAppName("Sequence")
>>>   val sc = new SparkContext(conf)
>>>   val sqlContext = new org.apache.spark.sql.SQLContext(sc)
>>>   val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>>   val a = Seq(("Mich",20), ("Christian", 18), ("James",13), ("Richard",16))
>>>   // Sort option 1 using tempTable
>>>   val b = a.toDF("Name","score").registerTempTable("tmp")
>>>   sql("select Name,score from tmp order by score desc").show
>>>   // Sort option 2 with FP
>>>   a.toDF("Name","score").sort(desc("score")).show
>>>  }
>>> }
>>> 
>>> I build this using sbt tool as below
>>> 
>>>  cat sequence.sbt
>>> name := "Sequence"
>>> version := "1.0"
>>> scalaVersion := "2.10.5"
>>> libraryDependencies += "org.apache.spark" %% "spark-core" % "1.5.0"
>>> libraryDependencies += "org.apache.spark" %% "spark-sql" % "1.0.0"
>>> libraryDependencies += "org.apache.spark" %% "spark-hive" % "1.5.0"
>>> 
>>> 
>>> But it fails compilation as below
>>> 
>>> [info]   Compilation completed in 12.366 s
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:15:
value toDF is not a member of Seq[(String, Int)]
>>> [error]   val b = a.toDF("Name","score").registerTempTable("tmp")
>>> [error]             ^
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:16:
not found: value sql
>>> [error]   sql("select Name,score from tmp order by score desc").show
>>> [error]   ^
>>> [error] /home/hduser/dba/bin/scala/Sequence/src/main/scala/Sequence.scala:18:
value toDF is not a member of Seq[(String, Int)]
>>> [error]   a.toDF("Name","score").sort(desc("score")).show
>>> [error]     ^
>>> [error] three errors found
>>> [error] (compile:compileIncremental) Compilation failed
>>> [error] Total time: 95 s, completed Mar 4, 2016 9:06:40 PM
>>> 
>>> I think I am missing some dependencies here
>>> 
>>> 
>>> I have a simple 
>>> Dr Mich Talebzadeh
>>>  
>>> LinkedIn  https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>>  
>>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>>  
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 


Mime
View raw message