hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-11452) Revisit FileSystem#rename
Date Mon, 05 Jan 2015 11:17:34 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-11452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264495#comment-14264495

Steve Loughran commented on HADOOP-11452:

# We can't remove {{rename()}. People would be surprised and upset. Therefore "constrain"
is not something you can now mandate. Sorry.
# We can declare that it SHOULD be atomic —which is precisely what we do in [the FS specs|http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html]
# Some extensions to object stores (e.g. Netflix S3mper) do retrofit atomicity to rename operations,
so can be used as the destination of speculative operations.
# HADOOP-9565 proposes moving filesystems that are really object stores to under a {{BlobStore}}
subclass of {{FileSystem}} and to offer a way to get a bitmask of consistency and atomicity
features. The design is intended to allow subclasses (e.g S3mper) to override semantics, and
alternate S3 and swift service providers to offer stricter semantics.
# Code that wants to explicitly check for the required semantics could then look for this
interface and, if present, get the semantics. Actually, if we really care, we may want to
push it to the base FS class -as a barrier for apps, not as a way for them to work around

If you look at {{FileSystem#protected void rename(final Path src, final Path dst,  final Rename...
options) }} in detail, you can see that apart from HDFS/webhdfs we aren't implementing rename()
atomically. Specifically
# we look for the file existing
# raise an error if the condition (source is dir, dest is file) && overwrite==false
# then rename

There's a race condition between the stat and rename. Maybe now that we support Java7+ only
we can think about using native IO operations which offer better atomicity.

We can talk about how to expose this stuff, which is something you should raise on  HDFS list.
Hadoop-common may be were the APIs live, but its hadoop-dev that owns the semantics.

Finally, regarding {{FileSystemRMStateStore}}. If that were moved to {{FileContext}} it gets
the public APIs. YARN already depends on implementations of {{AbstractFileSystem}}

> Revisit FileSystem#rename
> -------------------------
>                 Key: HADOOP-11452
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11452
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs
>            Reporter: Yi Liu
>            Assignee: Yi Liu
> Currently in {{FileSystem}}, {{rename}} with _Rename options_ is protected and with _deprecated_
annotation. And the default implementation is not atomic.
> So this method is not able to be used outside. On the other hand, HDFS has a good and
atomic implementation. (Also an interesting thing in {{DFSClient}}, the _deprecated_ annotations
for these two methods are opposite).
> It makes sense to make public for {{rename}} with _Rename options_, since it's atomic
for rename+overwrite, also it saves RPC calls if user desires rename+overwrite.

This message was sent by Atlassian JIRA

View raw message