spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Owen <so...@cloudera.com>
Subject Re: JavaRDD.foreach (new VoidFunction<>...) always returns the last element
Date Mon, 25 Jul 2016 18:01:44 GMT
Why are you converting to RDD and back to JavaRDD?
The problem is storing references to Writable, which are mutated by the
InputFormat. Somewhere you have 1000 refs to the same key. I think it may
be the persist. You want to immediately transform these values to something
besides a Writable.

On Mon, Jul 25, 2016, 18:50 Jia Zou <jacquelinezou@gmail.com> wrote:

>
> My code is as following:
>
>                 System.out.println("Initialize points...");
>
>                 JavaPairRDD<IntWritable, DoubleArrayWritable> data =
>
>                                 sc.sequenceFile(inputFile, IntWritable.
> class, DoubleArrayWritable.class);
>
>                 RDD<Tuple2<IntWritable, DoubleArrayWritable>> rdd =
>
>                                 JavaPairRDD.toRDD(data);
>
>                 JavaRDD<Tuple2<IntWritable, DoubleArrayWritable>> points
=
> JavaRDD.fromRDD(rdd, data.classTag());
>
>                 points.persist(StorageLevel.MEMORY_ONLY());
>
>                 int i;
>
>
>               for (i=0; i<iterations; i++) {
>
>                         System.out.println("iteration="+i);
>
>                         //points.foreach(new
> ForEachMapPointToCluster(numDimensions, numClusters));
>
>                         points.foreach(new
> VoidFunction<Tuple2<IntWritable, DoubleArrayWritable>>() {
>
>                             public void call(Tuple2<IntWritable,
> DoubleArrayWritable> tuple) {
>
>                                 IntWritable key = tuple._1();
>
>                                 System.out.println("key:"+key.get());
>
>                                 DoubleArrayWritable array = tuple._2();
>
>                                 double[] point = array.getData();
>
>                                 for (int d = 0; d < 20; d ++) {
>
>                                     System.out.println(d+":"+point[d]);
>
>                                 }
>
>                             }
>
>                         });
>
>                 }
>
>
> The output is a lot of following, only the last element in the rdd has
> been output.
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>
> key:999
>
> 0:0.9953839426689233
>
> 1:0.12656798341145892
>
> 2:0.16621114723289654
>
> 3:0.48628049787614236
>
> 4:0.476991470215116
>
> 5:0.5033640235789054
>
> 6:0.09257098597507829
>
> 7:0.3153088440494892
>
> 8:0.8807426085223242
>
> 9:0.2809625780570739
>
> 10:0.9584880094505738
>
> 11:0.38521222520661547
>
> 12:0.5114241334425228
>
> 13:0.9524628903835111
>
> 14:0.5252549496842003
>
> 15:0.5732037830866236
>
> 16:0.8632451606583632
>
> 17:0.39754347061499895
>
> 18:0.2859522809981715
>
> 19:0.2659002343432888
>

Mime
View raw message