beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-2712) SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions.
Date Tue, 01 Aug 2017 23:52:02 GMT
Eugene Kirpichov created BEAM-2712:
--------------------------------------

             Summary: SerializablePipelineOptions should not call FileSystems.setDefaultPipelineOptions.
                 Key: BEAM-2712
                 URL: https://issues.apache.org/jira/browse/BEAM-2712
             Project: Beam
          Issue Type: Bug
          Components: runner-apex, runner-core, runner-flink, runner-spark
            Reporter: Eugene Kirpichov
            Assignee: Kenneth Knowles


https://github.com/apache/beam/pull/3654 introduces SerializablePipelineOptions, which on
deserialization calls FileSystems.setDefaultPipelineOptions.

This is obviously problematic and racy in case the same process uses SerializablePipelineOptions
with different filesystem-related options in them.

The reason the PR does this is, Flink and Apex runners were already doing it in their respective
SerializablePipelineOptions-like classes (being removed in the PR); and Spark wasn't but probably
should have.

I believe this is done for the sake of having the proper filesystem options automatically
available on workers in all places where any kind of PipelineOptions are used. Instead, all
3 runners should pick a better place to initialize their workers, and explicitly call FileSystems.setDefaultPipelineOptions
there.

It would be even better if FileSystems.setDefaultPipelineOptions didn't exist at all, but
that's a topic for a separate JIRA.

CC'ing runner contributors [~aljoscha] [~aviemzur] [~thw]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message