spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tobias Pfeiffer <...@preferred.jp>
Subject Re: RDDs
Date Thu, 04 Sep 2014 00:55:28 GMT
Hello,


On Wed, Sep 3, 2014 at 6:02 PM, rapelly kartheek <kartheek.mbms@gmail.com>
wrote:
>
> Can someone tell me what kind of operations can be performed on a
> replicated rdd?? What are the use-cases of a replicated rdd.
>

I suggest you read

https://spark.apache.org/docs/latest/programming-guide.html#resilient-distributed-datasets-rdds
as an introduction, it lists a lot of the transformations and output
operations you can use.
Personally, I also found it quite helpful to read the paper about RDDs:
  http://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf


> One basic doubt that is bothering me from long time: what is the
> difference between an application and job in the Spark parlance. I am
> confused b'cas of Hadoop jargon.
>

OK, someone else might answer that. I am myself confused with application,
job, task, stage etc. ;-)

Tobias

Mime
View raw message