spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralf Heyde ...@hubrick.com>
Subject Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)
Date Fri, 20 Mar 2015 15:15:37 GMT
Good Idea, will try that.
But assuming, "only" data is located there, the problem will still occur.

On Fri, Mar 20, 2015 at 3:08 PM, Gourav Sengupta <gourav.sengupta@gmail.com>
wrote:

> Hi Ralf,
>
> using secret keys and authorization details is a strict NO for AWS, they
> are major security lapses and should be avoided at any cost.
>
> Have you tried starting the clusters using ROLES, they are wonderful way
> to start clusters or EC2 nodes and you do not have to copy and paste any
> permissions either.
>
> Try going through this article in AWS:
> http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html
> (though for datapipeline, they show the correct set of permissions to
> enable).
>
> I start EC2 nodes using roles (as mentioned in the link above), run the
> aws cli commands (without copying any keys or files).
>
> Please let me know if the issue was resolved.
>
> Regards,
> Gourav
>
> On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh@hubrick.com> wrote:
>
>> Hey,
>>
>> We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
>> self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
>> works as expected.
>>
>> i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4
>> of their API there, means: access is only possible via: AWS4-HMAC-SHA256
>>
>> This is still ok, but I dont get access there. What I tried already:
>>
>> All of the Approaches I tried with these URLs:
>> A) "s3n://<key>:<secret>@<bucket>/<path>/"
>> B) "s3://<key>:<secret>@<bucket>/<path>/"
>> C) "s3n://<bucket>/<path>/"
>> D) "s3://<bucket>/<path>/"
>>
>> 1a. setting Environment Variables in the operating system
>> 1b. found something, to set AccessKey/Secret in SparkConf like that (I
>> guess, this does not have any effect)
>>    sc.set("​AWS_ACCESS_KEY_ID", id)
>>    sc.set("​AWS_SECRET_ACCESS_KEY", secret)
>>
>> 2. tried to use a "more up to date" jets3t client (somehow I was not able
>> to get the "new" version running)
>> 3. tried in-URL basic authentication (A+B)
>> 4. Setting the hadoop configuration:
>> hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3.S3FileSystem");
>> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
>> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>>
>> hadoopConfiguration.set("fs.s3.impl",
>> "org.apache.hadoop.fs.s3.S3FileSystem");
>> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
>> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>>
>> -->
>> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
>> '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
>> Message: <?xml version="1.0"
>> encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
>> authorization mechanism you have provided is not supported. Please use
>> AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>
>>
>> 2. setting Hadoop Configuration
>> hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
>> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>>
>> hadoopConfiguration.set("fs.s3.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
>> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>>
>> -->
>> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
>> for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
>> ResponseCode=400, ResponseMessage=Bad Request
>>
>> 5. without Hadoop Config
>> Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
>> Key ID and Secret Access Key must be specified as the username or password
>> (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
>> fs.s3.awsSecretAccessKey properties (respectively).
>>
>> 6. without Hadoop Config but passed in S3 URL
>> with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
>> org.jets3t.service.S3ServiceException: S3 HEAD request failed for
>> '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
>> ResponseCode=400, ResponseMessage=Bad Request
>> with B) Exception in thread "main" java.lang.IllegalArgumentException:
>> AWS Access Key ID and Secret Access Key must be specified as the username
>> or password (respectively) of a s3 URL, or by setting the
>> fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
>>
>>
>> Drilled down in the Job, I can see, that the RestStorageService
>> recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log
>> below) -> i replaced the key / encoded secret with XXX_*_XXX:
>>
>> 15/03/20 11:25:31 WARN RestStorageService: Retrying request with
>> "AWS4-HMAC-SHA256" signing mechanism: GET
>> https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
>> HTTP/1.1
>> 15/03/20 11:25:31 WARN RestStorageService: Retrying request following
>> error response: GET
>> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
>> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
>> Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
>> XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
>> [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
>> rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
>> x-amz-region: eu-central-1, Content-Type: application/xml,
>> Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
>> Connection: close, Server: AmazonS3]
>> 15/03/20 11:25:32 WARN RestStorageService: Retrying request after
>> automatic adjustment of Host endpoint from "
>> frankfurt.ingestion.batch.s3.amazonaws.com" to "
>> frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
>> request signing error using AWS request signing version 4: GET
>> https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
>> HTTP/1.1
>> 15/03/20 11:25:32 WARN RestStorageService: Retrying request following
>> error response: GET
>> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
>> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
>> Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
>> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
>> frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date:
>> 20150320T112531Z, Authorization: AWS4-HMAC-SHA256
>> Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
>> Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
>> V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
>> Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
>> Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
>> Exception in thread "main"
>> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
>> s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz
>>
>>
>> Do you have any Ideas? Was somebody of you already able to access S3 in
>> Frankfurt, if so - how?
>>
>> Cheers Ralf
>>
>>
>>
>

Mime
View raw message