spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From shyla deshpande <deshpandesh...@gmail.com>
Subject Re: Converting dataframe to dataset question
Date Thu, 23 Mar 2017 21:23:46 GMT
I made the code even more simpler still getting the error

error: value toDF is not a member of Seq[com.whil.batch.Teamuser]
[ERROR]     val df = Seq(Teamuser("t1","u1","r1")).toDF()

object Test {
  def main(args: Array[String]) {
    val spark = SparkSession
      .builder
      .appName(getClass.getSimpleName)
      .getOrCreate()
    import spark.implicits._
    val sqlContext = spark.sqlContext
    import sqlContext.implicits._
    val df = Seq(Teamuser("t1","u1","r1")).toDF()
    df.printSchema()
  }
}
case class Teamuser(teamid:String, userid:String, role:String)




On Thu, Mar 23, 2017 at 1:07 PM, Yong Zhang <java8964@hotmail.com> wrote:

> Not sure I understand this problem, why I cannot reproduce it?
>
>
> scala> spark.version
> res22: String = 2.1.0
>
> scala> case class Teamuser(teamid: String, userid: String, role: String)
> defined class Teamuser
>
> scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
> df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more field]
>
> scala> df.show
> +------+------+-----+
> |teamid|userid| role|
> +------+------+-----+
> |    t1|    u1|role1|
> +------+------+-----+
>
> scala> df.createOrReplaceTempView("teamuser")
>
> scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
> newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more field]
>
> scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
> userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: string ...
1 more field]
>
> scala> userDS.show
> +------+------+-----+
> |teamid|userid| role|
> +------+------+-----+
> |    t1|    u1|role1|
> +------+------+-----+
>
>
> scala> userDS.printSchema
> root
>  |-- teamid: string (nullable = true)
>  |-- userid: string (nullable = true)
>  |-- role: string (nullable = true)
>
>
> Am I missing anything?
>
>
> Yong
>
>
> ------------------------------
> *From:* shyla deshpande <deshpandeshyla@gmail.com>
> *Sent:* Thursday, March 23, 2017 3:49 PM
> *To:* user
> *Subject:* Re: Converting dataframe to dataset question
>
> I realized, my case class was inside the object. It should be defined
> outside the scope of the object. Thanks
>
> On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande <deshpandeshyla@gmail.com
> > wrote:
>
>> Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your help.
Thanks
>>
>>     val spark = SparkSession
>>       .builder
>>       .config("spark.cassandra.connection.host", cassandrahost)
>>       .appName(getClass.getSimpleName)
>>       .getOrCreate()
>>
>>     import spark.implicits._
>>     val sqlContext = spark.sqlContext
>>     import sqlContext.implicits._
>>
>>     case class Teamuser(teamid:String, userid:String, role:String)
>>     spark
>>       .read
>>       .format("org.apache.spark.sql.cassandra")
>>       .options(Map("keyspace" -> "test", "table" -> "teamuser"))
>>       .load
>>       .createOrReplaceTempView("teamuser")
>>
>>     val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")
>>
>>     userDF.show()
>>
>>     val userDS:Dataset[Teamuser] = userDF.as[Teamuser]
>>
>>
>

Mime
View raw message