spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kundan kumar <iitr.kun...@gmail.com>
Subject Index wise most frequently occuring element
Date Tue, 27 Jan 2015 11:35:38 GMT
I have a an array of the form

val array: Array[(Int, (String, Int))] = Array(
  (idx1,(word1,count1)),
  (idx2,(word2,count2)),
  (idx1,(word1,count1)),
  (idx3,(word3,count1)),
  (idx4,(word4,count4)))....

I want to get the top 10 and bottom 10 elements from this array for each
index (idx1,idx2,....). Basically I want the top 10 most occuring and
bottom 10 least occuring elements for each index value.

Please suggest how to acheive in spark in most efficient way. I have tried
it using the for loops for each index but this makes the program too slow
and runs sequentially.

Thanks,

Kundan

Mime
View raw message