spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gourav Sengupta <gourav.sengu...@gmail.com>
Subject Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)
Date Fri, 20 Mar 2015 14:08:36 GMT
Hi Ralf,

using secret keys and authorization details is a strict NO for AWS, they
are major security lapses and should be avoided at any cost.

Have you tried starting the clusters using ROLES, they are wonderful way to
start clusters or EC2 nodes and you do not have to copy and paste any
permissions either.

Try going through this article in AWS:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html
(though for datapipeline, they show the correct set of permissions to
enable).

I start EC2 nodes using roles (as mentioned in the link above), run the aws
cli commands (without copying any keys or files).

Please let me know if the issue was resolved.

Regards,
Gourav

On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh@hubrick.com> wrote:

> Hey,
>
> We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
> self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
> works as expected.
>
> i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4
> of their API there, means: access is only possible via: AWS4-HMAC-SHA256
>
> This is still ok, but I dont get access there. What I tried already:
>
> All of the Approaches I tried with these URLs:
> A) "s3n://<key>:<secret>@<bucket>/<path>/"
> B) "s3://<key>:<secret>@<bucket>/<path>/"
> C) "s3n://<bucket>/<path>/"
> D) "s3://<bucket>/<path>/"
>
> 1a. setting Environment Variables in the operating system
> 1b. found something, to set AccessKey/Secret in SparkConf like that (I
> guess, this does not have any effect)
>    sc.set("​AWS_ACCESS_KEY_ID", id)
>    sc.set("​AWS_SECRET_ACCESS_KEY", secret)
>
> 2. tried to use a "more up to date" jets3t client (somehow I was not able
> to get the "new" version running)
> 3. tried in-URL basic authentication (A+B)
> 4. Setting the hadoop configuration:
> hadoopConfiguration.set("fs.s3n.impl",
> "org.apache.hadoop.fs.s3.S3FileSystem");
> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>
> hadoopConfiguration.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3.S3FileSystem");
> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>
> -->
> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
> '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
> Message: <?xml version="1.0"
> encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
> authorization mechanism you have provided is not supported. Please use
> AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>
>
> 2. setting Hadoop Configuration
> hadoopConfiguration.set("fs.s3n.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>
> hadoopConfiguration.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>
> -->
> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
> for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
> ResponseCode=400, ResponseMessage=Bad Request
>
> 5. without Hadoop Config
> Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
> Key ID and Secret Access Key must be specified as the username or password
> (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
> fs.s3.awsSecretAccessKey properties (respectively).
>
> 6. without Hadoop Config but passed in S3 URL
> with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: S3 HEAD request failed for
> '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
> ResponseCode=400, ResponseMessage=Bad Request
> with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS
> Access Key ID and Secret Access Key must be specified as the username or
> password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId
> or fs.s3.awsSecretAccessKey properties (respectively).
>
>
> Drilled down in the Job, I can see, that the RestStorageService recognizes
> AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) ->
> i replaced the key / encoded secret with XXX_*_XXX:
>
> 15/03/20 11:25:31 WARN RestStorageService: Retrying request with
> "AWS4-HMAC-SHA256" signing mechanism: GET
> https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
> HTTP/1.1
> 15/03/20 11:25:31 WARN RestStorageService: Retrying request following
> error response: GET
> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
> Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
> XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
> [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
> rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
> x-amz-region: eu-central-1, Content-Type: application/xml,
> Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
> Connection: close, Server: AmazonS3]
> 15/03/20 11:25:32 WARN RestStorageService: Retrying request after
> automatic adjustment of Host endpoint from "
> frankfurt.ingestion.batch.s3.amazonaws.com" to "
> frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
> request signing error using AWS request signing version 4: GET
> https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
> HTTP/1.1
> 15/03/20 11:25:32 WARN RestStorageService: Retrying request following
> error response: GET
> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
> Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
> frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z,
> Authorization: AWS4-HMAC-SHA256
> Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
> Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
> V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
> Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
> Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz
>
>
> Do you have any Ideas? Was somebody of you already able to access S3 in
> Frankfurt, if so - how?
>
> Cheers Ralf
>
>
>

Mime
View raw message