spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Russell Spitzer <>
Subject Re: Are Spark Dataframes mutable in Structured Streaming?
Date Wed, 15 May 2019 21:59:48 GMT
Dataframes describe the calculation to be done, but the underlying
implementation is an "Incremental Query". That is that the dataframe code
is executed repeatedly with Catalyst adjusting the final execution plan on
each run. Some parts of the plan refer to static pieces of data, others
refer to data which is pulled in on each iteration. None of this changes
the DataFrame objects themselves.

On Wed, May 15, 2019 at 1:34 PM Sheel Pancholi <> wrote:

> Hi
> Structured Streaming treats a stream as an unbounded table in the form of
> a DataFrame. Continuously flowing data from the stream keeps getting added
> to this DataFrame (which is the unbounded table) which warrants a change to
> the DataFrame which violates the vary basic nature of a DataFrame since a
> DataFrame by its nature is immutable. This sounds contradictory. Is there
> an explanation for this?
> Regards
> Sheel

View raw message