I'm trying to understand how spark works under the hood, so I tried to read the source code.

as I normally do, I downloaded the git source code, reverted to the very first version ( actually e5c4cd8a5e188592f8786a265c0cd073c69ac886 since the first version even lacked the definition of RDD.scala)

but the code looks "too simple" and I can't find where the "magic" happens, i.e. a transformation /computation is scheduled on  a machine, bytes stored etc.

it would be great if someone could show me a path in which the different source files are involved, so that I could read each of them in turn.