Hi, 

I would like to create a "zero" value for a Structured Streaming Dataframe and unfortunately, I couldn't find any leads.  With Spark batch, I can do a "emptyDataFrame" or "createDataFrame" with "emptyRDD" but with StructuredStreaming, I am lost. 

If I use the "emptyDataFrame" as the zero value, I wouldn't be able to join them with any other DataFrames in the program because Spark doesn't allow you to mix batch and stream data frames. (isStreaming=false for the Batch ones).

Any clue is greatly appreciated. Here are the alternatives that I have at the moment. 

1. Reading from an empty file 
Disadvantages : poll is expensive because it involves IO and it's error prone in the sense that someone might accidentally update the file.
val emptyErrorStream = (spark: SparkSession) => {
spark
.readStream
.format("csv")
.schema(DataErrorSchema)
.load("/Users/arunma/IdeaProjects/OSS/SparkDatalakeKitchenSink/src/test/resources/dummy1.txt")
.as[DataError]
}

2. Use MemoryStream
Disadvantages: MemoryStream itself is not recommended for production use because of the ability to mutate it but I am converting it to DS immediately. So, I am leaning towards this at the moment. 

val emptyErrorStream = (spark:SparkSession) => {
implicit val sqlC = spark.sqlContext
MemoryStream[DataError].toDS()
}
Cheers,
Arun