spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Ayoub <>
Subject Java RDD Union
Date Fri, 05 Dec 2014 19:27:03 GMT
I'm a bit confused regarding expected behavior of unions. I'm running on 8 cores. I have an
RDD that is used to collect cluster associations (cluster id, content id, distance) for internal
clusters as well as leaf clusters since I'm doing hierarchical k-means and need all distances
for sorting documents appropriately upon examination. 
It appears that Union simply adds items in the argument to the RDD instance the method is
called on rather than just returning a new RDD. If I want to do Union this was as more of
an add/append should I be capturing the return value and releasing it from memory. Need help
clarifying the semantics here. 
Also, in another related thread someone mentioned coalesce after union. Would I need to do
the same on the instance RDD I'm calling Union on. 
Perhaps a method such as append would be useful and clearer.   		 	   		  
View raw message