spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xi Shen <>
Subject k-means can only run on one executor with one thread?
Date Fri, 27 Mar 2015 03:04:45 GMT

I have a large data set, and I expects to get 5000 clusters.

I load the raw data, convert them into DenseVector; then I did repartition
and cache; finally I give the RDD[Vector] to KMeans.train().

Now the job is running, and data are loaded. But according to the Spark UI,
all data are loaded onto one executor. I checked that executor, and its CPU
workload is very low. I think it is using only 1 of the 8 cores. And all
other 3 executors are at rest.

Did I miss something? Is it possible to distribute the workload to all 4


View raw message