Sending again without the intranet link. (Probably got into spam)

 

Hi all,

 

I’ve a spark cluster on top of an HDFS cluster (3 nodes). The hdfs replication is 2. So if I upload a file  : hadoop fs –put something.txt, it is replicated to 2 nodes.

However, when I do rdd.saveAsTextFile ( .. ), it’s saved with replication factor 3 (i.e. on all nodes). How do I configure to save a text file with the same replication factor as specified for hadoop ?

 

Thanks,

 

Kapil Malik