spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From myasuka <myas...@live.com>
Subject How to use multi thread in RDD map function ?
Date Sun, 28 Sep 2014 03:44:02 GMT
Hi, everyone
    I come across with a problem about increasing the concurency. In a
program, after shuffle write, each node should fetch 16 pair matrices to do
matrix multiplication. such as:

*
import breeze.linalg.{DenseMatrix => BDM}

pairs.map(t => {
        val b1 = t._2._1.asInstanceOf[BDM[Double]]
        val b2 = t._2._2.asInstanceOf[BDM[Double]]
      
        val c = (b1 * b2).asInstanceOf[BDM[Double]]

        (new BlockID(t._1.row, t._1.column), c)
      })*
 
    Each node has 16 cores. However, no matter I set 16 tasks or more on
each node, the concurrency cannot be higher than 60%, which means not every
core on the node is computing. Then I check the running log on the WebUI,
according to the amount of shuffle read and write in every task, I see some
task do once matrix multiplication, some do twice while some do none.

    Thus, I think of using java multi thread to increase the concurrency. I
wrote a program in scala which calls java multi thread without Spark on a
single node, by watch the 'top' monitor, I find this program can use CPU up
to 1500% ( means nearly every core are computing). But I have no idea how to
use Java multi thread in RDD transformation.

    Is there any one can provide some example code to use Java multi thread
in RDD transformation, or give any idea to increase the concurrency ?

Thanks for all





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp15286.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message