spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun Manivannan <>
Subject Equivalent of emptyDataFrame in StructuredStreaming
Date Mon, 05 Nov 2018 23:29:14 GMT

I would like to create a "zero" value for a Structured Streaming Dataframe
and unfortunately, I couldn't find any leads.  With Spark batch, I can do a
"emptyDataFrame" or "createDataFrame" with "emptyRDD" but with
StructuredStreaming, I am lost.

If I use the "emptyDataFrame" as the zero value, I wouldn't be able to join
them with any other DataFrames in the program because Spark doesn't allow
you to mix batch and stream data frames. (isStreaming=false for the Batch

Any clue is greatly appreciated. Here are the alternatives that I have at
the moment.

*1. Reading from an empty file *
*Disadvantages : poll is expensive because it involves IO and it's error
prone in the sense that someone might accidentally update the file.*

val emptyErrorStream = (spark: SparkSession) => {

*2. Use MemoryStream*

*Disadvantages: MemoryStream itself is not recommended for production
use because of the ability to mutate it but I am converting it to DS
immediately. So, I am leaning towards this at the moment. *

val emptyErrorStream = (spark:SparkSession) => {
  implicit val sqlC = spark.sqlContext


View raw message