spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deenar Toraskar <deenar.toras...@gmail.com>
Subject Re: get host from rdd map
Date Mon, 26 Oct 2015 06:42:13 GMT
   1. You can call any api that returns you the hostname in your map
   function. Here's a simplified example, You would generally use
   mapPartitions as it will save the overhead of retrieving hostname multiple
   times
   2.
   3. import scala.sys.process._
   4. val distinctHosts = sc.parallelize(0 to 100).map { _ =>
   5. val hostname = ("hostname".!!).trim
   6. // your code
   7. (hostname)
   8. }.collect.distinct
   9.


On 24 October 2015 at 01:41, weoccc <weoccc@gmail.com> wrote:

> yea,
>
> my use cases is that i want to have some external communications where rdd
> is being run in map. The external communication might be handled separately
> transparent to spark.  What will be the hacky way and nonhacky way to do
> that ? :)
>
> Weide
>
>
>
> On Fri, Oct 23, 2015 at 5:32 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Can you outline your use case a bit more ?
>>
>> Do you want to know all the hosts which would run the map ?
>>
>> Cheers
>>
>> On Fri, Oct 23, 2015 at 5:16 PM, weoccc <weoccc@gmail.com> wrote:
>>
>>> in rdd map function, is there a way i can know the list of host names
>>> where the map runs ? any code sample would be appreciated ?
>>>
>>> thx,
>>>
>>> Weide
>>>
>>>
>>>
>>
>

Mime
View raw message