spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephen Coy <s...@infomedia.com.au.INVALID>
Subject When does SparkContext.defaultParallelism have the correct value?
Date Tue, 07 Jul 2020 03:35:02 GMT
Hi there,

I have found that if I invoke

sparkContext.defaultParallelism()

too early it will not return the correct value;

For example, if I write this:

final JavaSparkContext sparkContext = new JavaSparkContext(sparkSession.sparkContext());
final int workerCount = sparkContext.defaultParallelism();

I will get some small number (which I can’t recall right now).

However, if I insert:

sparkContext.parallelize(List.of(1, 2, 3, 4)).collect()

between these two lines I get the expected value being something like node_count * node_core_count;

This seems like a hacky work around solution to me. Is there a better way to get this value
initialised properly?

FWIW, I need this value to size a connection pool (fs.s3a.connection.maximum) correctly in
a cluster independent way.

Thanks,

Steve C


[http://downloads.ifmsystems.com/data/marketing/images/signatures/driving-force-newsletter.jpg]<https://www.infomedia.com.au/driving-force/?utm_campaign=200630%20Email%20Signature&utm_source=Internal&utm_medium=Email&utm_content=Driving%20Force>
This email contains confidential information of and is the copyright of Infomedia. It must
not be forwarded, amended or disclosed without consent of the sender. If you received this
message by mistake, please advise the sender and delete all copies. Security of transmission
on the internet cannot be guaranteed, could be infected, intercepted, or corrupted and you
should ensure you have suitable antivirus protection in place. By sending us your or any third
party personal details, you consent to (or confirm you have obtained consent from such third
parties) to Infomedia’s privacy policy. http://www.infomedia.com.au/privacy-policy/
Mime
View raw message