spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From WangXiaolong <roland8...@163.com>
Subject Structured Streaming doesn't write checkpoint log when I use coalesce
Date Thu, 09 Aug 2018 11:38:03 GMT
Hi,


   Lately, I encountered a problem, when I was writing as structured streaming job to write
things into opentsdb.
  The write-stream part looks something like 


      outputDs
          .coalesce(14)
          .writeStream
          .outputMode("append")
          .trigger(Trigger.ProcessingTime(s"$triggerSeconds seconds"))
          .option("checkpointLocation",s"$checkpointDir/$appName/tsdb")
          .foreach {
            TsdbWriter(
              tsdbUrl,
              MongoProp(mongoUrl, mongoPort, mongoUser, mongoPassword, mongoDatabase, mongoCollection,mongoAuthenticationDatabase)
            )(createMetricBuilder(tsdbMetricPrefix))
          }
          .start()


    And when I check the checkpoint dir, I discover that the "/checkpoint/state" dir  is empty.
I looked into the executor's log and found that the HDFSBackedStateStoreProvider didn't write
anything on the checkpoint dir.


   Strange thing is, when I replace the "coalesce" function into "repartition" function, the
problem solved. Is there a difference between these two functions when using structured streaming?


  Looking forward to you help, thanks.

Mime
View raw message