spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 叶先进 <advance...@gmail.com>
Subject Re: SizeEstimator
Date Tue, 27 Feb 2018 02:33:21 GMT
H Xin Liu, 

Could you provide a concrete user case if possible(code to reproduce protobuf object and comparisons
between  protobuf and normal object)?

I contributed a bit to SizeEstimator years ago, and to my understanding, the time complexity
should be O(N) where N is the num of referenced fields recursively.

We should definitely investigate this case if it indeed takes a lot of time on protobuf objects.

> On 27 Feb 2018, at 8:47 AM, Xin Liu <xin.e.liu@gmail.com> wrote:
> 
> Hi folks,
> 
> We have a situation where, shuffled data is protobuf based, and SizeEstimator is taking
a lot of time.
> 
> We have tried to override SizeEstimator to return a constant value, which speeds up things
a lot.
> 
> My questions, what is the side effect of disabling SizeEstimator? Is it just spark do
memory reallocation, or there is more severe consequences?
> 
> Thanks!


Mime
View raw message