spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Liu <xin.e....@gmail.com>
Subject SizeEstimator
Date Tue, 27 Feb 2018 00:47:07 GMT
Hi folks,

We have a situation where, shuffled data is protobuf based, and
SizeEstimator is taking a lot of time.

We have tried to override SizeEstimator to return a constant value, which
speeds up things a lot.

My questions, what is the side effect of disabling SizeEstimator? Is it
just spark do memory reallocation, or there is more severe consequences?

Thanks!

Mime
View raw message