spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Smith <>
Subject HPC with Spark? Simultaneous, parallel one to one mapping of partition to vcore
Date Sun, 20 Nov 2016 00:44:50 GMT
Dear community,

I have a RDD with N rows and N partitions. I want to ensure that the
partitions run all at the some time, by setting the number of vcores
(spark-yarn) to N. The partitions need to talk to each other with some
socket based sync that is why I need them to run more or less

Let's assume no node will die. Will my setup guarantee that all partitions
are computed in parallel?

I know this is somehow hackish. Is there a better way doing so?

My goal is replicate message passing (like OpenMPI) with spark, where I
have very specific and final communcation requirements. So no need for the
many comm and sync funtionality, just what I already have - sync and talk.


View raw message