spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: I think I am almost lost in the internals of Spark
Date Wed, 07 Jan 2015 01:24:44 GMT
Hi,

On Tue, Jan 6, 2015 at 11:24 PM, Todd <bit1129@163.com> wrote:

> I am a bit new to Spark, except that I tried simple things like word
> count, and the examples given in the spark sql programming guide.
> Now, I am investigating the internals of Spark, but I think I am almost
> lost, because I could not grasp a whole picture what spark does when it
> executes the word count.
>

I recommend understanding what an RDD is and how it is processed, using

http://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
and probably also
  http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
  (once the server is back).
Understanding how an RDD is processed is probably most helpful to
understand the whole of Spark.

Tobias

Mime
View raw message