spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Liu, Raymond" <>
Subject RE: RDDs
Date Thu, 04 Sep 2014 06:08:48 GMT
Actually, a replicated RDD and a parallel job on the same RDD, this two conception is not related
at all. 
A replicated RDD just store data on multiple node, it helps with HA and provide better chance
for data locality. It is still one RDD, not two separate RDD.
While regarding run two jobs on the same RDD, it doesn't matter that the RDD is replicated
or not. You can always do it if you wish to.

Best Regards,
Raymond Liu

-----Original Message-----
From: Kartheek.R [] 
Sent: Thursday, September 04, 2014 1:24 PM
Subject: RE: RDDs

Thank you Raymond and Tobias. 
Yeah, I am very clear about what I was asking. I was talking about "replicated" rdd only.
Now that I've got my understanding about job and application validated, I wanted to know if
we can replicate an rdd and run two jobs (that need same rdd) of an application in parallel?.


View this message in context:
Sent from the Apache Spark User List mailing list archive at

To unsubscribe, e-mail: For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message