flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2121) FileInputFormat.addFilesInDir miscalculates total size
Date Sun, 31 May 2015 11:33:17 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566522#comment-14566522

ASF GitHub Bot commented on FLINK-2121:

GitHub user ggevay opened a pull request:


    [FLINK-2121] Fix the summation in FileInputFormat.addFilesInDir

    Removed the length parameter, and made the length calculation start from 0 instead.
    I also added a second inner dir to the test, so now it catches this problem with any directory
listing order.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ggevay/flink dirSizeFix

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #752
commit 7fc86ce10ddc640126c7da8265403a815a30c2d2
Author: Gabor Gevay <ggab90@gmail.com>
Date:   2015-05-31T11:27:15Z

    [FLINK-2121] Fix the recursive summation in FileInputFormat.addFilesInDir


> FileInputFormat.addFilesInDir miscalculates total size
> ------------------------------------------------------
>                 Key: FLINK-2121
>                 URL: https://issues.apache.org/jira/browse/FLINK-2121
>             Project: Flink
>          Issue Type: Bug
>          Components: Core
>            Reporter: Gabor Gevay
>            Assignee: Gabor Gevay
>            Priority: Minor
> In FileInputFormat.addFilesInDir, the length variable should start from 0, because the
return value is always used by adding it to the length (instead of just assigning). So with
the current version, the length before the call will be seen twice in the result.
> mvn verify caught this for me now. The reason why this hasn't been seen yet, is because
testGetStatisticsMultipleNestedFiles catches this only if it gets the listings of the outer
directory in a certain order. Concretely, if the inner directory is seen before the other
file in the outer directory, then length is 0 at that point, so the bug doesn't show. But
if the other file is seen first, then its size is added twice to the total result.

This message was sent by Atlassian JIRA

View raw message