Chamikara Jayalath created BEAM-522: --------------------------------------- Summary: Update FileSink.finalize_write() to be idempotent Key: BEAM-522 URL: https://issues.apache.org/jira/browse/BEAM-522 Project: Beam Issue Type: Bug Components: sdk-py Reporter: Chamikara Jayalath Assignee: Chamikara Jayalath Currently FileSink.finelize_write() in fileio.py [1] performs following operations. (1) Obtains a list of temporary files as a side input (2) Renames each temporary file to the location where final output should be stored. iobase.Sink.finalize_write() operation should be idempotent since runner implementations may call this operation multiple times due to task failures. Current implementation is not idempotent because if we re-run the operation after renaming a sub-set of files, the operations may fail due to not being able to find some files at source location (for example, [2] for GCS files). We can fix this by checking if the destination file is already available before performing the rename and not performing the rename for files that are already available at the destination. [1] https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/fileio.py#L503 [2] https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/apache_beam/io/gcsio.py#L187 -- This message was sent by Atlassian JIRA (v6.3.4#6332)