spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kapil Malik <>
Subject RE: hdfs replication on saving RDD
Date Sun, 05 Jan 2014 18:01:07 GMT
Sending again without the intranet link. (Probably got into spam)

Hi all,

I've a spark cluster on top of an HDFS cluster (3 nodes). The hdfs replication is 2. So if
I upload a file  : hadoop fs -put something.txt, it is replicated to 2 nodes.
However, when I do rdd.saveAsTextFile ( .. ), it's saved with replication factor 3 (i.e. on
all nodes). How do I configure to save a text file with the same replication factor as specified
for hadoop ?


Kapil Malik

View raw message