Thanks, Rustagi. Yes, the global data is read-only and stays from the
beginning to the end of the whole Spark task. Actually, it is not only
identical for one Map/Reduce task, but used by a lot of map/reduce tasks of
mine. That's why I intend to put the data into each node of my cluster, and
hope to see if it is possible for a Spark Map/Reduce program to let all the
nodes read it simultaneously from their local disks rather than read it by
one node and broadcast to other nodes. Any suggestions for solving it?
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-handle-this-situation-Huge-File-Shared-by-All-maps-and-Each-Computer-Has-one-copy-tp5139p5192.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.