hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes
Date Fri, 14 Oct 2016 09:22:25 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15574764#comment-15574764

ASF GitHub Bot commented on HADOOP-13560:

Github user thodemoor commented on a diff in the pull request:

    --- Diff: hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md ---
    @@ -881,40 +881,362 @@ Seoul
     If the wrong endpoint is used, the request may fail. This may be reported as a 301/redirect
     or as a 400 Bad Request.
    -### S3AFastOutputStream
    - **Warning: NEW in hadoop 2.7. UNSTABLE, EXPERIMENTAL: use at own risk**
    -    <property>
    -      <name>fs.s3a.fast.upload</name>
    -      <value>false</value>
    -      <description>Upload directly from memory instead of buffering to
    -      disk first. Memory usage and parallelism can be controlled as up to
    -      fs.s3a.multipart.size memory is consumed for each (part)upload actively
    -      uploading (fs.s3a.threads.max) or queueing (fs.s3a.max.total.tasks)</description>
    -    </property>
    -    <property>
    -      <name>fs.s3a.fast.buffer.size</name>
    -      <value>1048576</value>
    -      <description>Size (in bytes) of initial memory buffer allocated for an
    -      upload. No effect if fs.s3a.fast.upload is false.</description>
    -    </property>
    +### <a name="s3a_fast_upload"></a>Stabilizing: S3A Fast Upload
    +**New in Hadoop 2.7; significantly enhanced in Hadoop 2.9**
    +Because of the nature of the S3 object store, data written to an S3A `OutputStream` 
    +is not written incrementally —instead, by default, it is buffered to disk
    +until the stream is closed in its `close()` method. 
    +This can make output slow:
    +* The execution time for `OutputStream.close()` is proportional to the amount of data
    +buffered and inversely proportional to the bandwidth. That is `O(data/bandwidth)`.
    +* The bandwidth is that available from the host to S3: other work in the same
    +process, server or network at the time of upload may increase the upload time,
    +hence the duration of the `close()` call.
    +* If a process uploading data fails before `OutputStream.close()` is called,
    +all data is lost.
    +* The disks hosting temporary directories defined in `fs.s3a.buffer.dir` must
    +have the capacity to store the entire buffered file.
    +Put succinctly: the further the process is from the S3 endpoint, or the smaller
    +the EC-hosted VM is, the longer it will take work to complete.
    +This can create problems in application code:
    +* Code often assumes that the `close()` call is fast;
    + the delays can create bottlenecks in operations.
    +* Very slow uploads sometimes cause applications to time out. (generally,
    +threads blocking during the upload stop reporting progress, so trigger timeouts)
    +* Streaming very large amounts of data may consume all disk space before the upload begins.
    +Work to addess this began in Hadoop 2.7 with the `S3AFastOutputStream`
    +[HADOOP-11183](https://issues.apache.org/jira/browse/HADOOP-11183), and
    +has continued with ` S3ABlockOutputStream`
    +This adds an alternative output stream, "S3a Fast Upload" which:
    +1.  Always uploads large files as blocks with the size set by
    +    `fs.s3a.multipart.size`. That is: the threshold at which multipart uploads
    +    begin and the size of each upload are identical.
    +1.  Buffers blocks to disk (default) or in on-heap or off-heap memory.
    +1.  Uploads blocks in parallel in background threads.
    +1.  Begins uploading blocks as soon as the buffered data exceeds this partition
    +    size.
    +1.  When buffering data to disk, uses the directory/directories listed in
    +    `fs.s3a.buffer.dir`. The size of data which can be buffered is limited
    +    to the available disk space.
    +1.  Generates output statistics as metrics on the filesystem, including
    +    statistics of active and pending block uploads.
    +1.  Has the time to `close()` set by the amount of remaning data to upload, rather
    +    than the total size of the file.
    +With incremental writes of blocks, "S3A fast upload" offers an upload
    +time at least as fast as the "classic" mechanism, with significant benefits
    +on long-lived output streams, and when very large amounts of data are generated.
    +The in memory buffering mechanims may also  offer speedup when running adjacent to
    +S3 endpoints, as disks are not used for intermediate data storage.
    +  <name>fs.s3a.fast.upload</name>
    +  <value>true</value>
    +  <description>
    +    Use the incremental block upload mechanism with
    +    the buffering mechanism set in fs.s3a.fast.upload.buffer.
    +    The number of threads performing uploads in the filesystem is defined
    +    by fs.s3a.threads.max; the queue of waiting uploads limited by
    +    fs.s3a.max.total.tasks.
    +    The size of each buffer is set by fs.s3a.multipart.size.
    +  </description>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +  <description>
    +    The buffering mechanism to use when using S3A fast upload
    +    (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
    +    This configuration option has no effect if fs.s3a.fast.upload is false.
    +    "disk" will use the directories listed in fs.s3a.buffer.dir as
    +    the location(s) to save data prior to being uploaded.
    +    "array" uses arrays in the JVM heap
    +    "bytebuffer" uses off-heap memory within the JVM.
    +    Both "array" and "bytebuffer" will consume memory in a single stream up to the number
    +    of blocks set by:
    +        fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
    +    If using either of these mechanisms, keep this value low
    +    The total number of threads performing work across all threads is set by
    +    fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the number of queued
    +    work items.
    --- End diff --
    idem as in pom.xml

> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13560-branch-2-001.patch, HADOOP-13560-branch-2-002.patch,
HADOOP-13560-branch-2-003.patch, HADOOP-13560-branch-2-004.patch
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights that metadata
isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very large commit
operations for committers using rename

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message