spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: How to authenticate to ADLS from within spark job on the fly
Date Sat, 19 Aug 2017 14:04:51 GMT

On 19 Aug 2017, at 02:42, Imtiaz Ahmed <emtiazahmed@gmail.com<mailto:emtiazahmed@gmail.com>>
wrote:


Hi All,

I am building a spark library which developers will use when writing their spark jobs to get
access to data on Azure Data Lake. But the authentication will depend on the dataset they
ask for. I need to call a rest API from within spark job to get credentials and authenticate
to read data from ADLS. Is that even possible? I am new to spark.

E.g, from inside a spark job a user will say:

MyCredentials myCredentials = MyLibrary.getCredentialsForPath(userId, "/some/path/on/azure/datalake");

then before spark.read.json("adl://examples/src/main/resources/people.json")
I need to authenticate the user to be able to read that path using the credentials fetched
above.

Any help is appreciated.

Thanks,
Imtiaz

The ADL filesystem supports addDelegationTokens(); allowing the caller to collect the delegation
tokens of the current authenticated user & then pass it along with the request —which
is exactly what spark should be doing in spark submit.

if you want to do it yourself, look in SparkHadoopUtils (I think; IDE is closed right now)
& see how the tokens are picked up and then passed around (marshalled over the job request,
unmarshalled after & picked up, with bits of the UserGroupInformation class doing the
low level work)

Java code snippet to write to the path tokenFile:

                FileSystem fs = FileSystem.get(conf);
                Credentials cred = new Credentials();
                Token<?> tokens[] = fs.addDelegationTokens(renewer, cred);
                cred.writeTokenStorageFile(tokenFile, conf);

you can then read that file in elsewhere, and then (somehow) get the FS to use those toakens

otherwise, ADL supports Oauth, so you may be able to use any Oauth libraries for this. hadoop-azure-dalalake
pulls in okhttp for that,

     <dependency>
      <groupId>com.squareup.okhttp</groupId>
      <artifactId>okhttp</artifactId>
      <version>2.4.0</version>
    </dependency>

-Steve

Mime
View raw message