spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 王永春 <yongchun.w...@audaque.com>
Subject Question about RDD creations in Spark
Date Mon, 17 Mar 2014 09:39:53 GMT
Hello. I have a question about RDD creations in Spark. When will a new RDD be created?

I got a initial RDD from the hadoopRDD method of SparkContext and do a count action on it.
After that I
could examine the RDD from the driver program's webui page. Then I do a flatMap transformation
on the
initial RDD and do a count action on the returned RDD reference from the former transformation.
I expected
that I could got a new RDD entry on the driver program's webui page beside the former one,
but it's not
the fact. There was still only one RDD entry - the initial RDD. 

Who can tell me whether no new RDD was created after the flatMap transformation and a successive
count
action or it had been created but not listed on the driver program's webui page? Following
is the  java code
fragment.

JavaSparkContext sc = new JavaSparkContext(...);
JavaRDD rdd0 = sc.hadoopRDD(...);
rdd0.cache();
rdd0.count();  // I could exam the initial RDD from the webui page after this action.

JavaRDD rdd1 = rdd0.flatMap(...);
rdd1.cache();
rdd1.count(); // I expected a new RDD be created after this action, but it seems not the fact.

----
Yongchun Wang
 Audaque Data Technology Ltd.
Shenzhen, China
Mime
View raw message