thanks, my point is that earlier versions are normally much simpler so it's easier to follow. and the basic structure should at least bare great similarity with latest version

On Sun, Jul 19, 2015 at 9:27 PM, Ted Yu <> wrote:
e5c4cd8a5e188592f8786a265 was from 2011.

Not sure why you started with such an early commit.

Spark project has evolved quite fast.

I suggest you clone Spark project from and start with core/src/main/scala/org/apache/spark/rdd/RDD.scala


On Sun, Jul 19, 2015 at 7:44 PM, Yang <> wrote:
I'm trying to understand how spark works under the hood, so I tried to read the source code.

as I normally do, I downloaded the git source code, reverted to the very first version ( actually e5c4cd8a5e188592f8786a265c0cd073c69ac886 since the first version even lacked the definition of RDD.scala)

but the code looks "too simple" and I can't find where the "magic" happens, i.e. a transformation /computation is scheduled on  a machine, bytes stored etc.

it would be great if someone could show me a path in which the different source files are involved, so that I could read each of them in turn.