I'm quite a Spark newbie so I might be wrong but I think that registerAsTable works either on case classes or on classes extending Product.

You find this info in an example on the doc page of Spark SQL: http://spark.apache.org/docs/latest/sql-programming-guide.html

// Define the schema using a case class.
// Note: Case classes in Scala 2.10 can support only up to 22 fields. To work around this limit, 
// you can use custom classes that implement the Product interface.
case class Person(name: String, age: Int)

If you want an example of a class extending Product in the code of Sparkling Water:

class Airlines( year :Option[Int], // 0
                month :Option[Int], // 1
                dayOfMonth :Option[Int], // 2
                dayOfWeek :Option[Int], // 3
                crsDepTime :Option[Int], // 5
                crsArrTime :Option[Int], // 7
                uniqueCarrier :Option[String], // 8
                flightNum :Option[Int], // 9
                tailNum :Option[Int], // 10
                crsElapsedTime:Option[Int], // 12
                origin :Option[String], // 16
                dest :Option[String], // 17
                distance :Option[Int], // 18
                isArrDelayed :Option[Boolean],// 29
                isDepDelayed :Option[Boolean] // 30
                ) extends Product {

I managed to register tables larger than 22 columns with this method.




SQL on spark 1.0 is an interesting feature. It works fine when the "record" is made of a case-class.

The issue I have is that I have around 50 attributes per record. scala Case-class can not handle that (hard-coded limit is 22 for some reason). So I created a regular class and defined the attributes in there.

// When running for the case-class I remove the "new"
val rdd = sc.textFile("myrecords.csv").map(line => new Record(line.split(",")))


// This works
// case class Record(first:String, second:String, third:String)

// This causes the registerAsTable to fail
class Record (list:Array[String]) {
val first = list(0)
val second = list(1)
val third = list(3)

When compiling, I get the following error:

value registerAsTable is not a member of org.apache.spark.rdd.RDD[Record]

What I'm I missing here? or is SQL/Spark 1.0 only capable of dealing with data set with 22-coloumns max?

- Kefah.