Hi lihu,

Maybe the data you're accessing is in in HDFS and only resides on 4 of your 20 machines because it's only about 4 blocks (at default 64MB / block that's around a quarter GB).  Where is your source data located and how is it stored?


On Thu, Jan 2, 2014 at 7:53 AM, lihu <lihu723@gmail.com> wrote:
   I run  spark on a cluster with 20 machine, but when I start an application use the spark-shell, there only 4 machine is working , the other with just idle, without memery and cpu used, I watch this through webui.

   I wonder the other machine maybe  busy, so i watch the machines using  "top" and "free" command, but this is not。
   So I just wonder why not spark assignment work to all all the 20 machine? this is not a good resource usage.