spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jungtaek Lim (Jira)" <j...@apache.org>
Subject [jira] [Created] (SPARK-30294) Read-only state store unnecessarily creates and deletes the temp file for delta file every batch
Date Wed, 18 Dec 2019 08:35:00 GMT
Jungtaek Lim created SPARK-30294:
------------------------------------

             Summary: Read-only state store unnecessarily creates and deletes the temp file
for delta file every batch
                 Key: SPARK-30294
                 URL: https://issues.apache.org/jira/browse/SPARK-30294
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Jungtaek Lim


[https://github.com/apache/spark/blob/d38f8167483d4d79e8360f24a8c0bffd51460659/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala#L143-L155]
{code:java}
    /** Abort all the updates made on this store. This store will not be usable any more.
*/
    override def abort(): Unit = {
      // This if statement is to ensure that files are deleted only if there are changes to
the
      // StateStore. We have two StateStores for each task, one which is used only for reading,
and
      // the other used for read+write. We don't want the read-only to delete state files.
      if (state == UPDATING) {
        state = ABORTED
        cancelDeltaFile(compressedStream, deltaFileStream)
      } else {
        state = ABORTED
      }
      logInfo(s"Aborted version $newVersion for $this")
    } {code}
Despite of the comment, read-only state store also does the same things for preparing write
- creates the temporary file, initializes output streams for the file, closes these output
streams, and deletes the temporary file. That is just unnecessary and gives confusion as according
to the log messages two different instances seem to write to same delta file.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message