beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chamikara Jayalath (JIRA)" <>
Subject [jira] [Commented] (BEAM-522) Update FileSink.finalize_write() to be idempotent
Date Thu, 04 Aug 2016 01:22:20 GMT


Chamikara Jayalath commented on BEAM-522:

Actually, the bug is in the exists() implementation of

Instead of catching IOError, we should be catching HttpError and checking error code to see
if it's 404.

With  this fixed FileSink.finalize_write() becomes properly idempotent since we handle failures
of rename() invocation at following location.

> Update FileSink.finalize_write() to be idempotent
> -------------------------------------------------
>                 Key: BEAM-522
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py
>            Reporter: Chamikara Jayalath
>            Assignee: Chamikara Jayalath
> Currently FileSink.finelize_write() in [1] performs following operations.
> (1) Obtains a list of temporary files as a side input
> (2) Renames each temporary file to the location where final output should be stored.
> iobase.Sink.finalize_write() operation should be idempotent since runner implementations
may call this operation multiple times due to task failures. 
> Current implementation is not idempotent because if we re-run the operation after renaming
a sub-set of files, the operations may fail due to not being able to find some files at source
location (for example, [2] for GCS files).
> We can fix this by checking if the destination file is already available before performing
the rename and not performing the rename for files that are already available at the destination.
> [1]
> [2]

This message was sent by Atlassian JIRA

View raw message