spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xin Liu <xin.e....@gmail.com>
Subject Re: SizeEstimator
Date Tue, 27 Feb 2018 03:23:48 GMT
Thanks!

Our protobuf object is fairly complex. Even O(N) takes a lot of time.

On Mon, Feb 26, 2018 at 6:33 PM, 叶先进 <advancedxy@gmail.com> wrote:

> H Xin Liu,
>
> Could you provide a concrete user case if possible(code to reproduce
> protobuf object and comparisons between  protobuf and normal object)?
>
> I contributed a bit to SizeEstimator years ago, and to my understanding,
> the time complexity should be O(N) where N is the num of referenced fields
> recursively.
>
> We should definitely investigate this case if it indeed takes a lot of
> time on protobuf objects.
>
>
> On 27 Feb 2018, at 8:47 AM, Xin Liu <xin.e.liu@gmail.com> wrote:
>
> Hi folks,
>
> We have a situation where, shuffled data is protobuf based, and
> SizeEstimator is taking a lot of time.
>
> We have tried to override SizeEstimator to return a constant value, which
> speeds up things a lot.
>
> My questions, what is the side effect of disabling SizeEstimator? Is it
> just spark do memory reallocation, or there is more severe consequences?
>
> Thanks!
>
>
>

Mime
View raw message