spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sim <>
Subject Scala API: simplifying common patterns
Date Sun, 07 Feb 2016 23:29:09 GMT
The more Spark code I write, the more I hit the same use cases where the
Scala APIs feel a bit awkward. I'd love to understand if there are
historical reasons for these and whether there is opportunity + interest to
improve the APIs. Here are my top two:
1. registerTempTable() returns Unit
def cachedDF(path: String, tableName: String) = {  val df =  df.registerTempTable(tableName)  df}//
vs.def cachedDF(path: String, tableName: String) =
2. No toDF() implicit for creating a DataFrame from an RDD + schema
val schema: StructType = ...val rdd = sc.textFile(...)  .map(...) 
.aggregate(...)val df = sqlContext.createDataFrame(rdd, schema)// vs.val
schema: StructType = ...val df = sc.textFile(...)  .map(...) 
.aggregate(...)  .toDF(schema)
Have you encountered other examples where small, low-risk API tweaks could
make common use cases more consistent + simpler to code?

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at
View raw message