spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Selvam Raman <sel...@gmail.com>
Subject Data frame Performance
Date Tue, 16 Aug 2016 11:06:12 GMT
Hi All,

Please suggest me the best approach to achieve result. [ Please comment if
the existing logic is fine or not]

Input Record :

ColA ColB ColC
1 2 56
1 2 46
1 3 45
1 5 34
1 5 90
2 1 89
2 5 45
​
Expected Result

ResA     ResB
1            2:2|3:3|5:5
2           1:1|5:5

I followd the below Spark steps

(Spark version - 1.5.0)

def valsplit(elem :scala.collection.mutable.WrappedArray[String]) : String
=
{

    elem.map(e => e+":"+e).mkString("|")
}

sqlContext.udf.register("valudf",valsplit(_:scala.collection.mutable.WrappedArray[String]))


val x =sqlContext.sql("select site,valudf(collect_set(requests)) as test
from sel group by site").first



-- 
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Mime
View raw message