spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Pankaj Narang <pankajnaran...@gmail.com>
Subject Re: Finding most occurrences in a JSON Nested Array
Date Mon, 05 Jan 2015 16:17:50 GMT
try as below

results.map(row => row(1)).collect

try 

var hobbies = results.flatMap(row => row(1))

It will create all the hobbies in a simpe array nowob

hbmap =hobbies.map(hobby =>(hobby,1)).reduceByKey((hobcnt1,hobcnt2)
=>hobcnt1+hobcnt2)

It will aggregate  hobbies as below

{swimming,2}, {hiking,1}


Now hbmap .map{case(hobby,count)=>(count,hobby)}.sortByKey(ascending
=false).collect 

will give you hobbies sorted in descending by their count
 
This is pseudo code and must help you

Regards
Pankaj






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Finding-most-occurrences-in-a-JSON-Nested-Array-tp20971p20975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message