spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Naveen Kumar Pokala <>
Subject Parallelize on spark context
Date Fri, 07 Nov 2014 06:43:06 GMT

JavaRDD<Integer> distData = sc.parallelize(data);

On what basis parallelize splits the data into multiple datasets. How to handle if we want
these many datasets to be executed per executor?

For example, my data is of 1000 integers list and I am having 2 node yarn cluster. It is diving
into 2 batches of 500 size.


View raw message