spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Hynes <91m...@gmail.com>
Subject Re: No speedup in MultiLayerPerceptronClassifier with increase in number of cores
Date Sun, 11 Oct 2015 16:19:39 GMT
Having only 2 workers for 5 machines would be your problem: you
probably want 1 worker per physical machine, which entails running the
spark-daemon.sh script to start a worker on those machines.
The partitioning is agnositic to how many executors are available for
running the tasks, so you can't do scalability tests in the manner
you're thinking by changing the partitioning.

On 10/11/15, Disha Shrivastava <dishu.905@gmail.com> wrote:
> Dear Spark developers,
>
> I am trying to study the effect of increasing number of cores ( CPU's) on
> speedup and accuracy ( scalability with spark ANN ) performance for the
> MNIST dataset using ANN implementation provided in the latest spark
> release.
>
> I have formed a cluster of 5 machines with 88 cores in total.The thing
> which is troubling me is that even if I have more than 2 workers in my
> spark cluster the job gets divided only to 2 workers.( executors) which
> Spark takes by default and hence it takes the same time . I know we can set
> the number of partitions manually using sc.parallelize(train_data,10)
> suppose which then divides the data in 10 partitions and all the workers
> are involved in the computation.I am using the below code:
>
>
> import org.apache.spark.ml.classification.MultilayerPerceptronClassifier
> import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
> import org.apache.spark.mllib.util.MLUtils
> import org.apache.spark.sql.Row
>
> // Load training data
> val data = MLUtils.loadLibSVMFile(sc, "data/10000_libsvm").toDF()
> // Split the data into train and test
> val splits = data.randomSplit(Array(0.7, 0.3), seed = 1234L)
> val train = splits(0)
> val test = splits(1)
> //val tr=sc.parallelize(train,10);
> // specify layers for the neural network:
> // input layer of size 4 (features), two intermediate of size 5 and 4 and
> output of size 3 (classes)
> val layers = Array[Int](784,160,10)
> // create the trainer and set its parameters
> val trainer = new
> MultilayerPerceptronClassifier().setLayers(layers).setBlockSize(128).setSeed(1234L).setMaxIter(100)
> // train the model
> val model = trainer.fit(train)
> // compute precision on the test set
> val result = model.transform(test)
> val predictionAndLabels = result.select("prediction", "label")
> val evaluator = new
> MulticlassClassificationEvaluator().setMetricName("precision")
> println("Precision:" + evaluator.evaluate(predictionAndLabels))
>
> Can you please suggest me how can I ensure that the data/task is divided
> equally to all the worker machines?
>
> Thanks and Regards,
> Disha Shrivastava
> Masters student, IIT Delhi
>


-- 
Thanks,
Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Mime
View raw message