spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <>
Subject Dataset API agg question
Date Tue, 07 Jun 2016 22:58:47 GMT
I'm trying to switch from RDD API to Dataset API
My question is about reduceByKey method

e.g. in the following example I'm trying to rewrite

sc.parallelize(Seq(1->2, 1->5, 3->6)).reduceByKey(math.max).take(10)

using DS API. That is what I have so far:

Seq(1->2, 1->5,


1. is it possible to avoid typing "as(ExpressionEncoder[Int])" or replace
it with smth shorter?

2.  Why I have to use String column name in max function? e.g. $"_2" or
col("_2").  can I use _._2 instead?


View raw message