beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amit Sela (JIRA)" <>
Subject [jira] [Commented] (BEAM-1556) Spark executors need to register IO factories
Date Tue, 28 Feb 2017 13:09:45 GMT


Amit Sela commented on BEAM-1556:

My line of thought about this being in the SDK (or better, the Runner API) is because the
runner would have to init. the registration for every instance, workers mostly (the implementation
of {{PipelineRunner}} would probably take care of it for the "Driver" instance).
Since not all {{DoFn}} require this, and not all readers/writes.. so it's either init. all
the time (regardless if needed or not) or the runner would have to patch-up for every new
use case: Read, Write, DoFn...
I'm not sure I'm going to like the following suggestion (fighting with myself a bit here),
but how about a {{FileSystemContext}} ? and the runner would have to initialize in it and
pass it on to the SDK ?

Not sure here.. thoughts ?  

> Spark executors need to register IO factories
> ---------------------------------------------
>                 Key: BEAM-1556
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Frances Perry
>            Assignee: Jean-Baptiste Onofré
> The Spark executors need to call IOChannelUtils.registerIOFactories(options) in order
to support GCS file and make the default WordCount example work.
> Context in this thread:

This message was sent by Atlassian JIRA

View raw message