spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Reynold Xin <r...@databricks.com>
Subject Re: groupByKey() and keys with many values
Date Tue, 08 Sep 2015 20:58:18 GMT
On Tue, Sep 8, 2015 at 6:51 AM, Antonio Piccolboni <antonio@piccolboni.info>
wrote:

> As far as the DB writes,  remember spark can retry a computation, so your
> writes have to be idempotent (see this thread
> <https://groups.google.com/forum/#!topic/spark-users/oM-IzQs0Z2s>, in
> which Reynold is a bit optimistic about failures than I am comfortable
> with, but who am I to question Reynold?)
>

I'm wrong all the time so please do question me :)

One thing is that apps should be using something like an output committer
to enforce idempotency. Maybe that's some API we can provide in Spark
itself to make it easier to write applications.

Mime
View raw message