spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wxhsdp <>
Subject same partition id means same location?
Date Thu, 01 May 2014 01:25:57 GMT

  i'am just reviewing "advanced spark features". it's about the pagerank

  it said "any shuffle operation on two RDDs will take on the partitioner of
one of them, if one is set".

  so first we partition the Links by hashPartitioner, then we join the Links
and Ranks0. Ranks0 will take 
  the hashPartitioner according to the document. the following reduceByKey
operation also respect the
  hashPartitioner, so when we join Links and Ranks1, there is no shuffle at

  does that mean partitions of different RDDs with the same id will go
exactly to the same location even
  if the different RDDs locates at different nodes originally?

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message