spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Burak Yavuz <brk...@gmail.com>
Subject Re: what are the types of tasks when running ALS iterations
Date Mon, 09 Mar 2015 15:47:53 GMT
+user
On Mar 9, 2015 8:47 AM, "Burak Yavuz" <brkyvz@gmail.com> wrote:

> Hi,
> In the web UI, you don't see every single task. You see the name of the
> last task before the stage boundary (which is a shuffle like a groupByKey),
> which in your case is a flatMap. Therefore you only see flatMap in the UI.
> The groupByKey and the flatMap that follows form a single stage. Please
> take a look at
>
> http://www.slideshare.net/mobile/pwendell/tuning-and-debugging-in-apache-spark
> for further reference.
>
> Burak
> On Mar 8, 2015 11:44 PM, "lisendong" <lisendong@163.com> wrote:
>
>> you see, the core of ALS 1.0.0 is the following code:
>> there should be flatMap and groupByKey when running ALS iterations ,
>> right?
>> but when I run als iteration, there are ONLY flatMap tasks...
>> do you know why?
>>
>>  private def updateFeatures(
>>                               products: RDD[(Int, Array[Array[Double]])],
>>                               productOutLinks: RDD[(Int, OutLinkBlock)],
>>                               userInLinks: RDD[(Int, InLinkBlock)],
>>                               partitioner: Partitioner,
>>                               rank: Int,
>>                               lambda: Double,
>>                               alpha: Double,
>>                               YtY: Option[Broadcast[DoubleMatrix]])
>>   : RDD[(Int, Array[Array[Double]])] =
>>   {
>>     val numBlocks = products.partitions.size
>>     productOutLinks.join(products).flatMap { case (bid, (outLinkBlock,
>> factors)) =>
>>       val toSend = Array.fill(numBlocks)(new ArrayBuffer[Array[Double]])
>>       for (p <- 0 until outLinkBlock.elementIds.length; userBlock <- 0
>> until
>> numBlocks) {
>>         if (outLinkBlock.shouldSend(p)(userBlock)) {
>>           toSend(userBlock) += factors(p)
>>         }
>>       }
>>       toSend.zipWithIndex.map{ case (buf, idx) => (idx, (bid,
>> buf.toArray))
>> }
>>     }.groupByKey(new HashPartitioner(numBlocks)) //这里1.0.0 的
>> als代码有bug,那个版本用的是传入的partitioner,起不到作用,会导致data
skew
>>       .join(userInLinks)
>>       .mapValues{ case (messages, inLinkBlock) =>
>>       updateBlock(messages, inLinkBlock, rank, lambda, alpha, YtY)
>>     }
>>   }
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/what-are-the-types-of-tasks-when-running-ALS-iterations-tp21966.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>

Mime
View raw message