spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: Java RDD Union
Date Fri, 05 Dec 2014 20:22:38 GMT
No, RDDs are immutable. union() creates a new RDD, and does not modify
an existing RDD. Maybe this obviates the question. I'm not sure what
you mean about releasing from memory. If you want to repartition the
unioned RDD, you repartition the result of union(), not anything else.

On Fri, Dec 5, 2014 at 1:27 PM, Ron Ayoub <ronaldayoub@live.com> wrote:
> I'm a bit confused regarding expected behavior of unions. I'm running on 8
> cores. I have an RDD that is used to collect cluster associations (cluster
> id, content id, distance) for internal clusters as well as leaf clusters
> since I'm doing hierarchical k-means and need all distances for sorting
> documents appropriately upon examination.
>
> It appears that Union simply adds items in the argument to the RDD instance
> the method is called on rather than just returning a new RDD. If I want to
> do Union this was as more of an add/append should I be capturing the return
> value and releasing it from memory. Need help clarifying the semantics here.
>
> Also, in another related thread someone mentioned coalesce after union.
> Would I need to do the same on the instance RDD I'm calling Union on.
>
> Perhaps a method such as append would be useful and clearer.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message