spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jacek Laskowski <>
Subject FileSystem.getContentSummary for total size stats in DetermineTableStats VS CommandUtils?
Date Tue, 02 Jan 2018 09:45:45 GMT

I was wondering what's wrong with FileSystem.getContentSummary
in CommandUtils.calculateLocationSize as "expressed" in the comment [1]:

    // This method is mainly based on
    // in Hive 0.13 (except that we do not use fs.getContentSummary).
    // TODO: Generalize statistics collection.
    // TODO: Why fs.getContentSummary returns wrong size on Jenkins?
    // Can we use fs.getContentSummary in future?
    // Seems fs.getContentSummary returns wrong table size on Jenkins. So
we use
    // countFileSize to count the table size.

until I found out that there seems to be no issue whatsoever
since DetermineTableStats uses it just fine [2].

Why does CommandUtils.calculateLocationSize *not* use what
DetermineTableStats does successfully?



Jacek Laskowski
Mastering Spark SQL
Spark Structured Streaming
Mastering Kafka Streams
Follow me at

View raw message