spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ayan guha <guha.a...@gmail.com>
Subject Re: Call http request from within Spark
Date Fri, 15 Jul 2016 08:32:12 GMT
Can you explain what do you mean by count never stops?
On 15 Jul 2016 00:53, "Amit Dutta" <amitkrdutta@outlook.com> wrote:

> Hi All,
>
>
> I have a requirement to call a rest service url for 300k customer ids.
>
> Things I have tried so far is
>
>
> custid_rdd = sc.textFile('file:////Users/zzz/CustomerID_SC/Inactive User
> Hashed LCID List.csv') #getting all the customer ids and building adds
>
> profile_rdd = custid_rdd.map(lambda r: getProfile(r.split(',')[0]))
>
> profile_rdd.count()
>
>
> #getprofile is the method to do the http call
>
> def getProfile(cust_id):
>
>     api_key = 'txt'
>
>     api_secret = 'yuyuy'
>
>     profile_uri = 'https://profile.localytics.com/x1/customers/{}'
>
>     customer_id = cust_id
>
>
>     if customer_id is not None:
>
>         data = requests.get(profile_uri.format(customer_id),
> auth=requests.auth.HTTPBasicAuth(api_key, api_secret))
>
> #         print json.dumps(data.json(), indent=4)
>
>     return data
>
>
> when I print the json dump of the data i see it returning results from the
> rest call. But the count never stops.
>
>
> Is there an efficient way of dealing this? Some post says we have to
> define a batch size etc but don't know how.
>
>
> Appreciate your help
>
>
> Regards,
>
> Amit
>
>

Mime
View raw message