spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Erlandson <eerla...@redhat.com>
Subject Re: Data Property Accumulators
Date Thu, 22 Aug 2019 00:18:39 GMT
I'm wondering whether keeping track of accumulation in "consistent mode" is
like a case for mapping straight to the Try value, so parsedData has type
RDD[Try[...]], and counting failures is
parsedData.filter(_.isFailure).count, etc

Put another way: Consistent mode accumulation seems (to me) like it is
trying to obey spark's RDD compute model, contrasted with legacy
accumulators which subvert that model. I think the fact that your "option
3" is sending information about accumulators down through mapping function
api, as well as passing through an Option" stage, is also hinting at that
idea.

That might mean the idiomatic way to do consistent mode is via the existing
spark API, and using constructs like Try, Either, Option, Tuple, or just a
new column carrying additional accumulator channels.


On Fri, Aug 16, 2019 at 5:48 PM Holden Karau <holden@pigscanfly.ca> wrote:

> Are folks interested in seeing data property accumulators for RDDs? I made
> a proposal for this back in Spark 2016 (
> https://docs.google.com/document/d/1lR_l1g3zMVctZXrcVjFusq2iQVpr4XvRK_UUDsDr6nk/edit
) but
> ABI compatibility was a stumbling block I couldn't design around. I can
> look at reviving it for Spark 3 or just go ahead and close out this idea.
>
> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>

Mime
View raw message