spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "老赵" <>
Subject how to split key from RDD for compute UV
Date Tue, 27 Jan 2015 12:14:35 GMT
Hello All,	I am writing a simple Spark application  to count  UV(unique view) from a log file。Below
is my code,it is not right on the red line .My idea  here is same cookie on a host  only count
one .So i want to split the host from the previous RDD. But now I don't know how to finish
it .Any suggestion will be appreciate! val url_index = args(1).toIntval cookie_index = args(2).toIntval
textRDD = sc.textFile(args(0))    	.map(_.split("\t"))    	.map(line => ((new
+ "\t" + line(cookie_index),1))    	.reduceByKey(_ + _)    	.map(line => (line.split("\t")(0),1))
   	.reduceByKey(_ + _)    	.map(item => item.swap)    	.sortByKey(false)    	.map(item
=> item.swap)

View raw message