spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Yadav <ri...@infoobjects.com>
Subject Re: reduceByKey and empty output files
Date Mon, 01 Dec 2014 00:19:51 GMT
How big is your input dataset?

On Thursday, November 27, 2014, Praveen Sripati <praveensripati@gmail.com>
wrote:

> Hi,
>
> When I run the below program, I see two files in the HDFS because the
> number of partitions in 2. But, one of the file is empty. Why is it so? Is
> the work not distributed equally to all the tasks?
>
> textFile.flatMap(lambda line: line.split()).map(lambda word: (word, 1)).
> *reduceByKey*(lambda a, b: a+b).*repartition(2)*
> .saveAsTextFile("hdfs://localhost:9000/user/praveen/output/")
>
> Thanks,
> Praveen
>


-- 
- Rishi

Mime
View raw message