spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Divya Narayan <narayan.divy...@gmail.com>
Subject hadoop replication property from spark code not working
Date Wed, 26 Jun 2019 12:22:15 GMT
Hi,

I have a use case for which I want to override the default hdfs replication
factor from my spark code. For this I have set the hadoop replication like
this:

val sc = new SparkContext(conf)
sc.hadoopConfiguration.set('dfs.replication','1').

Now my spark job runs as a cron job in some specific interval and create
output directory for corresponding hour. Problem I am facing is that for
80% of the runs,the  files are created with replication factor 1(which is
desired), but for rest 20% case files are created with default replication
factor 2. I am not sure why that is happening. Any help would be
appreciated.

Thank you
Divya

Mime
View raw message