spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zhiwei Chan <z.w.chan.ja...@gmail.com>
Subject Spark SQL(1.3.0) "import sqlContext.implicits._" seems not work for converting a case class RDD to DataFrame
Date Wed, 25 Mar 2015 03:46:24 GMT
Hi all,

  I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import
sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my code.
(I scan the programming guide and it seems this is the only change I need
to do). But it come to an error when run compile as following:
>>>
[ERROR] ...\magic.scala:527: error: value registerTempTable is not a member
of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table]
[INFO]     tableRdd.registerTempTable(tableName)
<<<

Then I try the exactly example in the programming guide of 1.3  in
spark-shell, it come to the same error.
>>>
scala> sys.env.get("CLASSPATH")
res7: Option[String] =
Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar)

scala>  val sqlContext = new org.apache.spark.sql.SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext =
org.apache.spark.sql.SQLContext@4b05b3ff

scala>  import sqlContext.implicits._
import sqlContext.implicits._

scala>  case class Person(name: String, age: Int)
defined class Person

scala>   val t1 =
sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt")
15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with
curMem=186397, maxMem=278302556
15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values in
memory (estimated size 79.5 KB, free 265.2 MB)
15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with
curMem=267840, maxMem=278302556
15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as
bytes in memory (estimated size 30.5 KB, free 265.1 MB)
15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory
on heju:48885 (size: 30.5 KB, free: 265.4 MB)
15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block
broadcast_3_piece0
15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile at
<console>:34
t1: org.apache.spark.rdd.RDD[String] =
hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at
textFile at <console>:34

scala>  val t2 = t1.flatMap(_.split("\n")).map(_.split(" ")).map(p =>
Person(p(0),1))
t2: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[12] at map at
<console>:38

scala>  t2.registerTempTable("people")
<console>:41: error: value registerTempTable is not a member of
org.apache.spark.rdd.RDD[Person]
               t2.registerTempTable("people")
                  ^
<<<

I found the following explanation in programming guide about implicit
convert case class to DataFrams, but I don't understand what I should do.
Could any one tell me how should I do if I want to convert a case class RDD
to DataFrame?

>>>
Isolation of Implicit Conversions and Removal of dsl Package (Scala-only)

Many of the code examples prior to Spark 1.3 started with import
sqlContext._, which brought all of the functions from sqlContext into
scope. In Spark 1.3 we have isolated the implicit conversions for
converting RDDs into DataFrames into an object inside of the SQLContext.
Users should now write import sqlContext.implicits._.

Additionally, the implicit conversions now only augment RDDs that are
composed of Products (i.e., case classes or tuples) with a method toDF,
instead of applying automatically.

<<<
Thanks
Jason

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message