hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13278) S3AFileSystem mkdirs does not need to validate parent path components
Date Fri, 15 Jun 2018 09:59:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16513624#comment-16513624

Steve Loughran commented on HADOOP-13278:

Linking to HADOOP-15220.

FWIW, Hadoop 3.1 has more IAM role support and handling of failed writes up the tree. Failing
reads isn't something that is coped with there, which is probably of interest to [~fabbri].
If we do want to handle that situation, it'll cover more than just mkdirs though; I can imagine
delete() & the scan for mock directories after a PUT needing coverage

> S3AFileSystem mkdirs does not need to validate parent path components
> ---------------------------------------------------------------------
>                 Key: HADOOP-13278
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13278
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3, tools
>            Reporter: Adrian Petrescu
>            Priority: Minor
> According to S3 semantics, there is no conflict if a bucket contains a key named {{a/b}}
and also a directory named {{a/b/c}}. "Directories" in S3 are, after all, nothing but prefixes.
> However, the {{mkdirs}} call in {{S3AFileSystem}} does go out of its way to traverse
every parent path component for the directory it's trying to create, making sure there's no
file with that name. This is suboptimal for three main reasons:
>  * Wasted API calls, since the client is getting metadata for each path component 
>  * This can cause *major* problems with buckets whose permissions are being managed by
IAM, where access may not be granted to the root bucket, but only to some prefix. When you
call {{mkdirs}}, even on a prefix that you have access to, the traversal up the path will
cause you to eventually hit the root bucket, which will fail with a 403 - even though the
directory creation call would have succeeded.
>  * Some people might actually have a file that matches some other file's prefix... I
can't see why they would want to do that, but it's not against S3's rules.
> I've opened a pull request with a simple patch that just removes this portion of the
check. I have tested it with my team's instance of Spark + Luigi, and can confirm it works,
and resolves the aforementioned permissions issue for a bucket on which we only had prefix
> This is my first ticket/pull request against Hadoop, so let me know if I'm not following
some convention properly :)

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message