spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Robert <tho...@creativedata.fr>
Subject Re: RDD registerAsTable gives error on regular scala class records
Date Thu, 10 Jul 2014 21:24:40 GMT
Hi,

I'm quite a Spark newbie so I might be wrong but I think that
registerAsTable works either on case classes or on classes extending
Product.

You find this info in an example on the doc page of Spark SQL:
http://spark.apache.org/docs/latest/sql-programming-guide.html

// Define the schema using a case class.// Note: Case classes in Scala
2.10 can support only up to 22 fields. To work around this limit, //
you can use custom classes that implement the Product interface.case
class Person(name: String, age: Int)


If you want an example of a class extending Product in the code of
Sparkling Water:
https://github.com/0xdata/h2o-sparkling/blob/master/src/main/scala/water/sparkling/demo/Schemas.scala

class Airlines( year          :Option[Int],    // 0
                month         :Option[Int],    // 1
                dayOfMonth    :Option[Int],    // 2
                dayOfWeek     :Option[Int],    // 3
                crsDepTime    :Option[Int],    // 5
                crsArrTime    :Option[Int],    // 7
                uniqueCarrier :Option[String], // 8
                flightNum     :Option[Int],    // 9
                tailNum       :Option[Int],    // 10
                crsElapsedTime:Option[Int],    // 12
                origin        :Option[String], // 16
                dest          :Option[String], // 17
                distance      :Option[Int],    // 18
                isArrDelayed  :Option[Boolean],// 29
                isDepDelayed  :Option[Boolean] // 30
                ) extends Product {
...
}


I managed to register tables larger than 22 columns with this method.

Bye.

-- 

*Thomas ROBERT*
www.creativedata.fr


2014-07-10 14:39 GMT+02:00 Kefah Issa <kefah@freesoft.jo>:

> Hi,
>
> SQL on spark 1.0 is an interesting feature. It works fine when the
> "record" is made of a case-class.
>
> The issue I have is that I have around 50 attributes per record. scala
> Case-class can not handle that (hard-coded limit is 22 for some reason). So
> I created a regular class and defined the attributes in there.
>
>
> // When running for the case-class I remove the "new"
> val rdd = sc.textFile("myrecords.csv").map(line => new
> Record(line.split(",")))
>
> rdd.registerAsTable("records")
>
>
> // This works
> // case class Record(first:String, second:String, third:String)
>
> // This causes the registerAsTable to fail
> class Record (list:Array[String]) {
> val first = list(0)
> val second = list(1)
> val third = list(3)
> }
>
>
> When compiling, I get the following error:
>
> value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]
>
> What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with
> data set with 22-coloumns max?
>
> Regards,
> - Kefah.
>

Mime
View raw message