spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <hamishberri...@tutanota.com>
Subject Parallelism in custom Receiver
Date Tue, 21 Jan 2020 13:36:38 GMT
I custom a receiver that can process data from an external source. And I read the doc saying


    A DStream is associated with a single receiver. For attaining read parallelism multiple
receivers i.e. multiple DStreams need to be created. A receiver is run within an executor.
It occupies one core. Ensure that there are enough cores for processing after receiver slots
are booked i.e. spark.cores.max should take the receiver slots into account. The receivers
are allocated to executors in a round robin fashion.

https://spark.apache.org/docs/latest/streaming-programming-guide.html#important-points-to-remember

So I should be able to launch multiple receiver. But my question is how to increase parallelism
of Receiver? I do not see any parameter can be tuned according to doc - https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.streaming.receiver.Receiver

   val sc = new SparkConf().setMaster("local[*]").setAppName("MyAppName")
    val ssc = new StreamingContext(sc, Seconds(1))
    val stream = ssc.receiverStream(new MyReceiver())
    stream.print
    ssc.start
    Try(ssc.awaitTermination) match {
      case Success(_) => println("Finish streaming ....")
      case Failure(ex) => println(s"exception : $ex")
    }

Right now I use local, but I would like to learn both clustered mode and local mode strategy
in launching multiple receiver for parallelism. Appreciate any suggestions! 


Mime
View raw message