spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: How do you perform blocking IO in apache spark job?
Date Mon, 08 Sep 2014 15:40:01 GMT
Hi,

I What does the external service provide? Data? Calculations? Can the
service push data to you via Kafka and Spark streaming ? Can you fetch the
necessary data beforehand from the service? The solution to your question
depends on your answers.

I would not recommend to connect to a blocking service during spark jobs
execution. What do you do if a node crashes? Is order of service calls for
you relevant?

Best regards
Le 8 sept. 2014 17:31, "DrKhu" <khudyakov.max@gmail.com> a écrit :

> What if, when I traverse RDD, I need to calculate values in dataset by
> calling external (blocking) service? How do you think that could be
> achieved?
>
> val values: Future[RDD[Double]] = Future sequence tasks
>
> I've tried to create a list of Futures, but as RDD id not Traversable,
> Future.sequence is not suitable.
>
> I just wonder, if anyone had such a problem, and how did you solve it? What
> I'm trying to achieve is to get a parallelism on a single worker node, so I
> can call that external service 3000 times per second.
>
> Probably, there is another solution, more suitable for spark, like having
> multiple working nodes on single host.
>
> It's interesting to know, how do you cope with such a challenge? Thanks.
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-do-you-perform-blocking-IO-in-apache-spark-job-tp13704.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Mime
View raw message