mahout-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: Mahout 1.0: parallelism/number tasks during SimilarityAnalysis.rowSimilarity
Date Mon, 13 Oct 2014 16:06:42 GMT
On Mon, Oct 13, 2014 at 11:56 AM, Reinis Vicups <mahout@orbit-x.de> wrote:

> I have my own implementation of SimilarityAnalysis and by tuning number of
> tasks I have reached HUGE performance gains.
>
> Since I couldn't find how to pass the number of tasks to shuffle
> operations directly, I have set following in spark config
>
> configuration = new SparkConf().setAppName(jobConfig.jobName)
>         .set("spark.serializer", "org.apache.spark.serializer.
> KryoSerializer")
>         .set("spark.kryo.registrator", "org.apache.mahout.sparkbindings.io
> .MahoutKryoRegistrator")
>         .set("spark.kryo.referenceTracking", "false")
>         .set("spark.kryoserializer.buffer.mb", "200")
>         .set("spark.default.parallelism", 400) // <- this is the line
> supposed to set default parallelism to some high number
>
> Thank you for your help
>

Thank you for YOUR help!

Do you think that simply increasing this parameter is a safe and sane thing
to do?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message