spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Armbrust <mich...@databricks.com>
Subject Re: Need some Clarification on checkpointing w.r.t Spark Structured Streaming
Date Mon, 11 Sep 2017 21:26:18 GMT
Checkpoints record what has been processed for a specific query, and as
such only need to be defined when writing (which is how you "start" a
query).

You can use the DataFrame created with readStream to start multiple
queries, so it wouldn't really make sense to have a single checkpoint there.

On Mon, Sep 11, 2017 at 2:36 AM, kant kodali <kanth909@gmail.com> wrote:

> Hi All,
>
> I was wondering if we need to checkpoint both read and write streams when
> reading from Kafka and inserting into a target store?
>
> for example
>
> sparkSession.readStream().option("checkpointLocation", "hdfsPath").load()
>
> vs
>
> dataSet.writeStream().option("checkpointLocation", "hdfsPath")
>
> Thanks!
>

Mime
View raw message