spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wenchen Fan <cloud0...@gmail.com>
Subject Re: AccumulatorV2 vs AccumulableParam (V1)
Date Thu, 03 May 2018 14:20:59 GMT
Hi Sergey,

Thanks for your valuable feedback!

For 1: yea this is definitely a bug and I have sent a PR to fix it.
For 2: I have left my comments on the JIRA ticket.
For 3: I don't quite understand it, can you give some concrete examples?
For 4: yea this is a problem, but I think it's not a big deal, and we
couldn't find a better solution at that time.
For 5: I think this is a real problem. It looks to me that we can merge
`isZero`, `copyAndReset`, `copy`, `reset` into one API: `zero`, which is
basically just the `copyAndReset`. If there is a way to fix this without
breaking the existing API, I'm really happy to do it.
For 6: same as 4. It's a problem but not a big deal.

In general, I think accumulator v2 sacrifices some flexibility to simplify
the framework and improve the performance. Users can still use accumulator
v1 if flexibility is more important to them. We can keep improving
accumulator v2 without breaking backward compatibility.

Thanks,
Wenchen

On Thu, May 3, 2018 at 6:20 AM, Sergey Zhemzhitsky <szh.subs@gmail.com>
wrote:

> Hello guys,
>
> I've started to migrate my Spark jobs which use Accumulators V1 to
> AccumulatorV2 and faced with the following issues:
>
> 1. LegacyAccumulatorWrapper now requires the resulting type of
> AccumulableParam to implement equals. In other case the
> AccumulableParam, automatically wrapped into LegacyAccumulatorWrapper,
> will fail with AssertionError (SPARK-23697 [1]).
>
> 2. Existing AccumulatorV2 classes are hardly difficult to extend
> easily and correctly (SPARK-24154 [2]) due to its "copy" method which
> is called during serialization and usually loses type information of
> descendant classes which don't override "copy" (and it's easier to
> implement an accumulator from scratch than override it correctly)
>
> 3. The same instance of AccumulatorV2 cannot be used with the same
> SparkContext multiple times (unlike AccumulableParam) failing with
> "IllegalStateException: Cannot register an Accumulator twice" even
> after "reset" method called. So it's impossible to unregister already
> registered accumulator from user code.
>
> 4. AccumulableParam (V1) implementations are usually more or less
> stateless, while AccumulatorV2 implementations are almost always
> stateful, leading to (unnecessary?) type checks (unlike
> AccumulableParam). For example typical "merge" method of AccumulatorV2
> requires to check whether current accumulator is of an appropriate
> type, like here [3]
>
> 5. AccumulatorV2 is more difficult to implement correctly unlike
> AccumulableParam. For example, in case of AccumulableParam I have to
> implement just 3 methods (addAccumulator, addInPlace, zero), in case
> of AccumulableParam - just 2 methods (addInPlace, zero) and in case of
> AccumulatorV2 - 6 methods (isZero, copy, reset, add, merge, value)
>
> 6. AccumulatorV2 classes are hardly possible to be anonymous classes,
> because of their "copy" and "merge" methods which typically require a
> concrete class to make a type check.
>
> I understand the motivation for AccumulatorV2 (SPARK-14654 [4]), but
> just wondering whether there is a way to simplify the API of
> AccumulatorV2 to meet the points described above and to be less error
> prone?
>
>
> [1] https://issues.apache.org/jira/browse/SPARK-23697
> [2] https://issues.apache.org/jira/browse/SPARK-24154
> [3] https://github.com/apache/spark/blob/4f5bad615b47d743b8932aea107165
> 2293981604/core/src/main/scala/org/apache/spark/util/
> AccumulatorV2.scala#L348
> [4] https://issues.apache.org/jira/browse/SPARK-14654
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Mime
View raw message