I imported that as my first command in my previous email.  I'm using a spark-shell.

scala> import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoder

scala>


Any comments regarding importing implicits in an application?

Thanks.

-Don

On Wed, Feb 1, 2017 at 6:10 PM, Michael Armbrust <michael@databricks.com> wrote:
This is the error, you are missing an import:

<console>:13: error: not found: type Encoder
       abstract class RawTable[A : Encoder](inDir: String) {

Works for me in a REPL.

On Wed, Feb 1, 2017 at 3:34 PM, Don Drake <dondrake@gmail.com> wrote:
Thanks for the reply.   I did give that syntax a try [A : Encoder] yesterday, but I kept getting this exception in a spark-shell and Zeppelin browser.

scala> import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoder

scala>

scala> case class RawTemp(f1: String, f2: String, temp: Long, created_at: java.sql.Timestamp, data_filename: String)
defined class RawTemp

scala>

scala> import spark.implicits._
import spark.implicits._

scala>

scala> abstract class RawTable[A : Encoder](inDir: String) {
     |     import spark.implicits._
     |     def load() = {
     |         import spark.implicits._
     |         spark.read
     |             .option("header", "true")
     |             .option("mode", "FAILFAST")
     |             .option("escape", "\"")
     |             .option("nullValue", "")
     |             .option("indferSchema", "true")
     |             .csv(inDir)
     |             .as[A]
     |     }
     | }
<console>:13: error: not found: type Encoder
       abstract class RawTable[A : Encoder](inDir: String) {
                                   ^
<console>:24: error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
                   .as[A]


I gave it a try today in a Scala application and it seems to work.  Is this a known issue in a spark-shell?

In my Scala application, this is being defined in a separate file, etc. without direct access to a Spark session.

I had to add the following code snippet so the import spark.implicits._ would take effect:

// ugly hack to get around Encoder can't be found compile time errors

private object myImplicits extends SQLImplicits {

  protected override def _sqlContext: SQLContext = MySparkSingleton.getCurrentSession().sqlContext

}

import myImplicits._

I found that in about the hundredth SO post I searched for this problem.  Is this the best way to let implicits do its thing?

Thanks.

-Don



On Wed, Feb 1, 2017 at 3:16 PM, Michael Armbrust <michael@databricks.com> wrote:
You need to enforce that an Encoder is available for the type A using a context bound.

import org.apache.spark.sql.Encoder
abstract class RawTable[A : Encoder](inDir: String) {
  ...
}

On Tue, Jan 31, 2017 at 8:12 PM, Don Drake <dondrake@gmail.com> wrote:
I have a set of CSV that I need to perform ETL on, with the plan to re-use a lot of code between each file in a parent abstract class. 

I tried creating the following simple abstract class that will have a parameterized type of a case class that represents the schema being read in.

This won't compile, it just complains about not being able to find an encoder, but I'm importing the implicits and don't believe this error.


scala> import spark.implicits._
import spark.implicits._

scala>

scala> case class RawTemp(f1: String, f2: String, temp: Long, created_at: java.sql.Timestamp, data_filename: String)
defined class RawTemp

scala>

scala> abstract class RawTable[A](inDir: String) {
     |     def load() = {
     |         spark.read
     |             .option("header", "true")
     |             .option("mode", "FAILFAST")
     |             .option("escape", "\"")
     |             .option("nullValue", "")
     |             .option("indferSchema", "true")
     |             .csv(inDir)
     |             .as[A]
     |     }
     | }
<console>:27: error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
                   .as[A]

scala> class TempTable extends RawTable[RawTemp]("/user/drake/t.csv")
<console>:13: error: not found: type RawTable
       class TempTable extends RawTable[RawTemp]("/user/drake/t.csv")
                      ^

What's odd is that this output looks okay:

scala> val RTEncoder = Encoders.product[RawTemp]
RTEncoder: org.apache.spark.sql.Encoder[RawTemp] = class[f1[0]: string, f2[0]: string, temp[0]: bigint, created_at[0]: timestamp, data_filename[0]: string]

scala> RTEncoder.schema
res4: org.apache.spark.sql.types.StructType = StructType(StructField(f1,StringType,true), StructField(f2,StringType,true), StructField(temp,LongType,false), StructField(created_at,TimestampType,true), StructField(data_filename,StringType,true))

scala> RTEncoder.clsTag
res5: scala.reflect.ClassTag[RawTemp] = RawTemp

Any ideas?

--




--




--
Donald Drake
Drake Consulting
http://www.drakeconsulting.com/
https://twitter.com/dondrake
800-733-2143