spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Dutta <>
Subject Call http request from within Spark
Date Thu, 14 Jul 2016 14:52:55 GMT
Hi All,

I have a requirement to call a rest service url for 300k customer ids.

Things I have tried so far is

custid_rdd = sc.textFile('file:////Users/zzz/CustomerID_SC/Inactive User Hashed LCID List.csv')
#getting all the customer ids and building adds

profile_rdd = r: getProfile(r.split(',')[0]))


#getprofile is the method to do the http call

def getProfile(cust_id):

    api_key = 'txt'

    api_secret = 'yuyuy'

    profile_uri = '{}'

    customer_id = cust_id

    if customer_id is not None:

        data = requests.get(profile_uri.format(customer_id), auth=requests.auth.HTTPBasicAuth(api_key,

#         print json.dumps(data.json(), indent=4)

    return data

when I print the json dump of the data i see it returning results from the rest call. But
the count never stops.

Is there an efficient way of dealing this? Some post says we have to define a batch size etc
but don't know how.

Appreciate your help



View raw message