spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eugen.wintersber...@gmail.com
Subject Appending a static dataframe to a stream create Parquet file fails
Date Thu, 02 Sep 2021 13:03:56 GMT
Hi all,
  I recently stumbled about a rather strange  problem with streaming
sources in one of my tests. I am writing a Parquet file from a
streaming source and subsequently try to append the same data but this
time from a static dataframe. Surprisingly, the number of rows in the
Parquet file remains the same after the append operation. 
Here is the relevant code

  "Appending data from static dataframe" must "produce twice as much data" in {
    logLinesStream.writeStream
      .format("parquet")
      .option("path", path.toString)
      .outputMode("append")
      .start()
      .processAllAvailable()
    spark.read.format("parquet").load(path.toString).count mustBe 1159

    logLinesDF.write.format("parquet").mode("append").save(path.toString)
    spark.read.format("parquet").load(path.toString).count mustBe 2*1159
  }

Does anyone have an idea what I am doing wrong here?

thanks in advance
 Eugen Wintersberger

Mime
View raw message