spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason <>
Subject Re: Getting number of physical machines in Spark
Date Fri, 28 Aug 2015 20:09:19 GMT
I've wanted similar functionality too: when network IO bound (for me I was
trying to pull things from s3 to hdfs) I wish there was a `.mapMachines`
api where I wouldn't have to try guess at the proper partitioning of a
'driver' RDD for `sc.parallelize(1 to N, N).map( i=> pull the i'th chunk
from S3 )`.

On Thu, Aug 27, 2015 at 10:01 AM Young, Matthew T <>

> What’s the canonical way to find out the number of physical machines in a
> cluster at runtime in Spark? I believe SparkContext.defaultParallelism will
> give me the number of cores, but I’m interested in the number of NICs.
> I’m writing a Spark streaming application to ingest from Kafka with the
> Receiver API and want to create one DStream per physical machine for read
> parallelism’s sake. How can I figure out at run time how many machines
> there are so I know how many DStreams to create?

View raw message