hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-13560) S3ABlockOutputStream to support huge (many GB) file writes
Date Fri, 14 Oct 2016 12:46:20 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-13560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15575234#comment-15575234

ASF GitHub Bot commented on HADOOP-13560:

Github user steveloughran commented on a diff in the pull request:

    --- Diff: hadoop-common-project/hadoop-common/src/main/resources/core-default.xml ---
    @@ -1095,10 +1102,50 @@
    -  <description>Upload directly from memory instead of buffering to
    -    disk first. Memory usage and parallelism can be controlled as up to
    -    fs.s3a.multipart.size memory is consumed for each (part)upload actively
    -    uploading (fs.s3a.threads.max) or queueing (fs.s3a.max.total.tasks)</description>
    +  <description>
    +    Use the incremental block-based fast upload mechanism with
    +    the buffering mechanism set in fs.s3a.fast.upload.buffer.
    +  </description>
    +  <name>fs.s3a.fast.upload.buffer</name>
    +  <value>disk</value>
    +  <description>
    +    The buffering mechanism to use when using S3A fast upload
    +    (fs.s3a.fast.upload=true). Values: disk, array, bytebuffer.
    +    This configuration option has no effect if fs.s3a.fast.upload is false.
    +    "disk" will use the directories listed in fs.s3a.buffer.dir as
    +    the location(s) to save data prior to being uploaded.
    +    "array" uses arrays in the JVM heap
    +    "bytebuffer" uses off-heap memory within the JVM.
    +    Both "array" and "bytebuffer" will consume memory in a single stream up to the number
    +    of blocks set by:
    +        fs.s3a.multipart.size * fs.s3a.fast.upload.active.blocks.
    +    If using either of these mechanisms, keep this value low
    +    The total number of threads performing work across all threads is set by
    +    fs.s3a.threads.max, with fs.s3a.max.total.tasks values setting the number of queued
    +    work items.
    --- End diff --
    you know, now that you can have a queue per stream, it could be set to something
    bigger. This is something we could look at in the docs, leaving out of the XML so as
    to have a single topic. This phrase here describes the number of active threads, which
    is different —and will be more so once there's other work (COPY, DELETE) going on there.
    So: wont change here

> S3ABlockOutputStream to support huge (many GB) file writes
> ----------------------------------------------------------
>                 Key: HADOOP-13560
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13560
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.9.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>         Attachments: HADOOP-13560-branch-2-001.patch, HADOOP-13560-branch-2-002.patch,
HADOOP-13560-branch-2-003.patch, HADOOP-13560-branch-2-004.patch
> An AWS SDK [issue|https://github.com/aws/aws-sdk-java/issues/367] highlights that metadata
isn't copied on large copies.
> 1. Add a test to do that large copy/rname and verify that the copy really works
> 2. Verify that metadata makes it over.
> Verifying large file rename is important on its own, as it is needed for very large commit
operations for committers using rename

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message