spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From cheez <>
Subject Bucket mappings of map stage output
Date Thu, 06 Aug 2015 21:47:09 GMT
Hey all. 
I was trying to understand Spark Internals by looking in to (and hacking)
the code. I was basically trying to explore the buckets which are generated
when we partition the output of each map task and then let the reduce side
fetch them on the basis of paritionId. I went into the write() method of
SortShuffleWriter and there is an Iterator by the name of records passed in
to it as an argument. This key-value pair is what I though represented the
buckets. But upon exploring its contents I realized that i was wrong because
pairs with same keys were being shown in different buckets which should not
have been the case. 
I'd really appreciate if someone could help me find where these buckets

View this message in context:
Sent from the Apache Spark Developers List mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message