spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralf Heyde ...@hubrick.com>
Subject Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)
Date Fri, 20 Mar 2015 13:53:18 GMT
Hey,

We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
works as expected.

i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of
their API there, means: access is only possible via: AWS4-HMAC-SHA256

This is still ok, but I dont get access there. What I tried already:

All of the Approaches I tried with these URLs:
A) "s3n://<key>:<secret>@<bucket>/<path>/"
B) "s3://<key>:<secret>@<bucket>/<path>/"
C) "s3n://<bucket>/<path>/"
D) "s3://<bucket>/<path>/"

1a. setting Environment Variables in the operating system
1b. found something, to set AccessKey/Secret in SparkConf like that (I
guess, this does not have any effect)
   sc.set("​AWS_ACCESS_KEY_ID", id)
   sc.set("​AWS_SECRET_ACCESS_KEY", secret)

2. tried to use a "more up to date" jets3t client (somehow I was not able
to get the "new" version running)
3. tried in-URL basic authentication (A+B)
4. Setting the hadoop configuration:
hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
'/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
authorization mechanism you have provided is not supported. Please use
AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>

2. setting Hadoop Configuration
hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
ResponseCode=400, ResponseMessage=Bad Request

5. without Hadoop Config
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
Key ID and Secret Access Key must be specified as the username or password
(respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
fs.s3.awsSecretAccessKey properties (respectively).

6. without Hadoop Config but passed in S3 URL
with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
org.jets3t.service.S3ServiceException: S3 HEAD request failed for
'/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
ResponseCode=400, ResponseMessage=Bad Request
with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS
Access Key ID and Secret Access Key must be specified as the username or
password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId
or fs.s3.awsSecretAccessKey properties (respectively).


Drilled down in the Job, I can see, that the RestStorageService recognizes
AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) ->
i replaced the key / encoded secret with XXX_*_XXX:

15/03/20 11:25:31 WARN RestStorageService: Retrying request with
"AWS4-HMAC-SHA256" signing mechanism: GET
https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
HTTP/1.1
15/03/20 11:25:31 WARN RestStorageService: Retrying request following error
response: GET
'/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
-- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
[x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
x-amz-region: eu-central-1, Content-Type: application/xml,
Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
Connection: close, Server: AmazonS3]
15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic
adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com"
to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
request signing error using AWS request signing version 4: GET
https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
HTTP/1.1
15/03/20 11:25:32 WARN RestStorageService: Retrying request following error
response: GET
'/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
-- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z,
Authorization: AWS4-HMAC-SHA256
Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz


Do you have any Ideas? Was somebody of you already able to access S3 in
Frankfurt, if so - how?

Cheers Ralf

Mime
View raw message