hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-14943) Add common getFileBlockLocations() emulation for object stores, including S3A
Date Fri, 30 Apr 2021 19:44:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-14943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337593#comment-17337593
] 

Hadoop QA commented on HADOOP-14943:
------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  0s{color} | {color:blue}{color}
| {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m 10s{color} | {color:red}{color}
| {color:red} HADOOP-14943 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute
for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | HADOOP-14943 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12910618/HADOOP-14943-004.patch
|
| Console output | https://ci-hadoop.apache.org/job/PreCommit-HADOOP-Build/188/console |
| versions | git=2.17.1 |
| Powered by | Apache Yetus 0.13.0-SNAPSHOT https://yetus.apache.org |


This message was automatically generated.



> Add common getFileBlockLocations() emulation for object stores, including S3A
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-14943
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14943
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.8.1
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>         Attachments: HADOOP-14943-001.patch, HADOOP-14943-002.patch, HADOOP-14943-002.patch,
HADOOP-14943-003.patch, HADOOP-14943-004.patch
>
>
> It looks suspiciously like S3A isn't providing the partitioning data needed in {{listLocatedStatus}}
and {{getFileBlockLocations()}} needed to break up a file by the blocksize. This will stop
tools using the MRv1 APIS doing the partitioning properly if the input format isn't doing
it own split logic.
> FileInputFormat in MRv2 is a bit more configurable about input split calculation &
will split up large files. but otherwise, the partitioning is being done more by the default
values of the executing engine, rather than any config data from the filesystem about what
its "block size" is,
> NativeAzureFS does a better job; maybe that could be factored out to hadoop-common and
reused?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message