spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <nick.pentre...@gmail.com>
Subject Re: RDD[Future[T]] => Future[RDD[T]]
Date Mon, 27 Jul 2015 08:18:47 GMT
In this case, each partition will block until the futures in that partition
are completed.

If you are in the end collecting all the Futures to the driver, what is the
reasoning behind using an RDD? You could just use a bunch of Futures
directly.

If you want to do some processing on the results of the futures, then I'd
say you would need to block in each partition until the Futures' results
are completed, as I'm not at all sure whether Futures would be composable
across stage / task boundaries.



On Mon, Jul 27, 2015 at 9:33 AM, Ayoub <benali.ayoub.info@gmail.com> wrote:

> do you mean something like this ?
>
> val values = rdd.mapPartitions{ i: Iterator[Future[T]] =>
>>   val future: Future[Iterator[T]] = Future sequence i
>>   Await result (future, someTimeout)
>> }
>
>
> Where is the blocking happening in this case? It seems to me that all the
> workers will be blocked until the future is completed, no ?
>
> 2015-07-27 7:24 GMT+02:00 Nick Pentreath <[hidden email]
> <http:///user/SendEmail.jtp?type=node&node=24005&i=0>>:
>
>> You could use Iterator.single on the future[iterator].
>>
>> However if you collect all the partitions I'm not sure if it will work
>> across executor boundaries. Perhaps you may need to await the sequence of
>> futures in each partition and return the resulting iterator.
>>
>> —
>> Sent from Mailbox <https://www.dropbox.com/mailbox>
>>
>>
>> On Sun, Jul 26, 2015 at 10:43 PM, Ayoub Benali <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=24005&i=1>> wrote:
>>
>>> It doesn't work because mapPartitions expects a function f:(Iterator[T])
>>> ⇒ Iterator[U] while .sequence wraps the iterator in a Future
>>>
>>> 2015-07-26 22:25 GMT+02:00 Ignacio Blasco <[hidden email]
>>> <http:///user/SendEmail.jtp?type=node&node=24005&i=2>>:
>>>
>>>> Maybe using mapPartitions and .sequence inside it?
>>>>  El 26/7/2015 10:22 p. m., "Ayoub" <[hidden email]
>>>> <http:///user/SendEmail.jtp?type=node&node=24005&i=3>> escribió:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am trying to convert the result I get after doing some async IO :
>>>>>
>>>>> val rdd: RDD[T] = // some rdd
>>>>>
>>>>> val result: RDD[Future[T]] = rdd.map(httpCall)
>>>>>
>>>>> Is there a way collect all futures once they are completed in a *non
>>>>> blocking* (i.e. without scala.concurrent
>>>>> Await) and lazy way?
>>>>>
>>>>> If the RDD was a standard scala collection then calling
>>>>> "scala.concurrent.Future.sequence" would have resolved the issue but
>>>>> RDD is
>>>>> not a TraversableOnce (which is required by the method).
>>>>>
>>>>> Is there a way to do this kind of transformation with an
>>>>> RDD[Future[T]] ?
>>>>>
>>>>> Thanks,
>>>>> Ayoub.
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/RDD-Future-T-Future-RDD-T-tp24000.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=24005&i=4>
>>>>> For additional commands, e-mail: [hidden email]
>>>>> <http:///user/SendEmail.jtp?type=node&node=24005&i=5>
>>>>>
>>>>>
>>>
>>
>
> ------------------------------
> View this message in context: Re: RDD[Future[T]] => Future[RDD[T]]
> <http://apache-spark-user-list.1001560.n3.nabble.com/Re-RDD-Future-T-Future-RDD-T-tp24005.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Mime
View raw message