hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Fabbri (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15229) Add FileSystem builder-based openFile() API to match createFile()
Date Fri, 30 Nov 2018 21:21:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16705291#comment-16705291

Aaron Fabbri commented on HADOOP-15229:

Looks good overall [~stevel@apache.org]. Good suggestions here too. I'd also lean towards
an interface + composition for common code but defer to you.

As usual, good tests, good docs. Thank you. I prefer separate patches but then again good
to see the end-to-end example of how this all comes together with CLI tool -> builder ->
S3A implementation.

Random comments on the patch. Minor wording nits are suggestions,,
+   * Reject a configuration if one or more mandatory keys are
+   * not in the set of mandatory keys.
/not in the set of known keys/

I think you can remove the whitespace change to HarFileSystem.java

The async API is interesting. Folks who want synchronous should wrap with LambdaUtils' eval()?
I generally favor async apis, but we should keep in mind we can always translate between sync
and async, and the common case should be the default, unless there is another benefit (e.g.
performance, error behavior). Just some thoughts here. I'm ok with your async API.
+The `openFile()` operation may check the state of the filesystem during this
+call, but as the state of the filesystem may change betwen this call and
+the actual `build()` and `get()` operations, this file-specific
+preconditions (file exists, file is readable, etc) MUST NOT be checked here.{noformat}
Took me a second to parse this but you clarify later on. Maybe change last word "here" to
"in openFile()". Also "this file-specific" to "any file-specific". Also spelling on "betwen".
 some aspects of the state of the filesystem, MAY be checked in the initial 
+`openFile()` call, provided they are known to be invariants which will not
+change between `openFile()` and the `build().get()` sequence. For example,
+path validation.{noformat}
Makes sense. Path validation can happen in openFile but path existence, for example, must

Thanks for pulling SelectBinding stuff into separate class. Haven't tested yet.. I'm self-funding
my AWS usage these days and I need to set that up still.

> Add FileSystem builder-based openFile() API to match createFile()
> -----------------------------------------------------------------
>                 Key: HADOOP-15229
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15229
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: fs, fs/azure, fs/s3
>    Affects Versions: 3.0.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Major
>         Attachments: HADOOP-15229-001.patch, HADOOP-15229-002.patch, HADOOP-15229-003.patch,
HADOOP-15229-004.patch, HADOOP-15229-004.patch, HADOOP-15229-005.patch, HADOOP-15229-006.patch
> Replicate HDFS-1170 and HADOOP-14365 with an API to open files.
> A key requirement of this is not HDFS, it's to put in the fadvise policy for working
with object stores, where getting the decision to do a full GET and TCP abort on seek vs smaller
GETs is fundamentally different: the wrong option can cost you minutes. S3A and Azure both
have adaptive policies now (first backward seek), but they still don't do it that well.
> Columnar formats (ORC, Parquet) should be able to say "fs.input.fadvise" "random" as
an option when they open files; I can imagine other options too.
> The Builder model of [~eddyxu] is the one to mimic, method for method. Ideally with as
much code reuse as possible

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message