spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Mocanu <>
Subject remove duplicates
Date Mon, 24 Mar 2014 16:44:59 GMT
I have a DStream like this:

Is there a way to remove duplicates across the entire DStream? Ie: I would like the output
to be (by removing one of the b's):
..RDD[a],RDD[b,c]..  or ..RDD[a,b],RDD[c]..


View raw message