hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15107) Prove the correctness of the new committers, or fix where they are not correct
Date Fri, 05 Jan 2018 20:01:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313776#comment-16313776
] 

Steve Loughran commented on HADOOP-15107:
-----------------------------------------

Committers can reduce load on shards by shuffling their requests a bit

* Staging task commit: schedule the largest file first, then shuffle the rest. Ensures that
the biggest file isn't the straggler, and the rest go wherever.
* All job commit: shuffle the list of pending files


> Prove the correctness of the new committers, or fix where they are not correct
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-15107
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15107
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> I'm writing about the paper on the committers, one which, being a proper paper, requires
me to show the committers work.
> # define the requirements of a "Correct" committed job (this applies to the FileOutputCommitter
too)
> # show that the Staging committer meets these requirements (most of this is implicit
in that it uses the V1 FileOutputCommitter to marshall .pendingset lists from committed tasks
to the final destination, where they are read and committed.
> # Show the magic committer also works.
> I'm now not sure that the magic committer works.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message