spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Dutta <amitkrdu...@outlook.com>
Subject Call http request from within Spark
Date Thu, 14 Jul 2016 14:52:55 GMT
Hi All,


I have a requirement to call a rest service url for 300k customer ids.

Things I have tried so far is


custid_rdd = sc.textFile('file:////Users/zzz/CustomerID_SC/Inactive User Hashed LCID List.csv')
#getting all the customer ids and building adds

profile_rdd = custid_rdd.map(lambda r: getProfile(r.split(',')[0]))

profile_rdd.count()


#getprofile is the method to do the http call

def getProfile(cust_id):

    api_key = 'txt'

    api_secret = 'yuyuy'

    profile_uri = 'https://profile.localytics.com/x1/customers/{}'

    customer_id = cust_id


    if customer_id is not None:

        data = requests.get(profile_uri.format(customer_id), auth=requests.auth.HTTPBasicAuth(api_key,
api_secret))

#         print json.dumps(data.json(), indent=4)

    return data


when I print the json dump of the data i see it returning results from the rest call. But
the count never stops.


Is there an efficient way of dealing this? Some post says we have to define a batch size etc
but don't know how.


Appreciate your help


Regards,

Amit

Mime
View raw message