Hi Ralf,

using secret keys and authorization details is a strict NO for AWS, they are major security lapses and should be avoided at any cost.

Have you tried starting the clusters using ROLES, they are wonderful way to start clusters or EC2 nodes and you do not have to copy and paste any permissions either.

Try going through this article in AWS: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html (though for datapipeline, they show the correct set of permissions to enable).

I start EC2 nodes using roles (as mentioned in the link above), run the aws cli commands (without copying any keys or files).

Please let me know if the issue was resolved.

Regards,
Gourav

On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh@hubrick.com> wrote:
Hey, 

We want to run a Job, accessing S3, from EC2 instances. The Job runs in a self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything works as expected. 

i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of their API there, means: access is only possible via: AWS4-HMAC-SHA256

This is still ok, but I dont get access there. What I tried already:

All of the Approaches I tried with these URLs:
A) "s3n://<key>:<secret>@<bucket>/<path>/"
B) "s3://<key>:<secret>@<bucket>/<path>/"
C) "s3n://<bucket>/<path>/"
D) "s3://<bucket>/<path>/"

1a. setting Environment Variables in the operating system
1b. found something, to set AccessKey/Secret in SparkConf like that (I guess, this does not have any effect)
   sc.set("​AWS_ACCESS_KEY_ID", id)
   sc.set("​AWS_SECRET_ACCESS_KEY", secret)

2. tried to use a "more up to date" jets3t client (somehow I was not able to get the "new" version running)
3. tried in-URL basic authentication (A+B)
4. Setting the hadoop configuration:
hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>

2. setting Hadoop Configuration 
hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

--> 
Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request

5. without Hadoop Config
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

6. without Hadoop Config but passed in S3 URL 
with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request
with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).


Drilled down in the Job, I can see, that the RestStorageService recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) -> i replaced the key / encoded secret with XXX_*_XXX:

15/03/20 11:25:31 WARN RestStorageService: Retrying request with "AWS4-HMAC-SHA256" signing mechanism: GET https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F HTTP/1.1
15/03/20 11:25:31 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers: [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2: rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=, x-amz-region: eu-central-1, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT, Connection: close, Server: AmazonS3]
15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com" to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following request signing error using AWS request signing version 4: GET https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/ HTTP/1.1
15/03/20 11:25:32 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host: frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z, Authorization: AWS4-HMAC-SHA256 Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4], Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2: V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz


Do you have any Ideas? Was somebody of you already able to access S3 in Frankfurt, if so - how?

Cheers Ralf