spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From zhangliyun <kelly...@126.com>
Subject A question about radd bytes size
Date Mon, 02 Dec 2019 05:05:52 GMT
Hi:


 I want to get the total bytes of a DataFrame by following function , but when I insert the
DataFrame into hive , I found the value of the function is different from spark.sql.statistics.totalSize
.  The spark.sql.statistics.totalSize  is less than the result of following function getRDDBytes
. 


   def getRDDBytes(df:DataFrame):Long={

  df.rdd.getNumPartitions match {
case 0 =>
0
case numPartitions =>
val rddOfDataframe = df.rdd.map(_.toString().getBytes("UTF-8").length.toLong)
val size = if (rddOfDataframe.isEmpty()) {
0
} else {
        rddOfDataframe.reduce(_ + _)
      }

      size
  }

}
Appreciate if you can provide your suggestion.


Best Regards
Kelly Zhang

Mime
View raw message