beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Davor Bonaci (JIRA)" <>
Subject [jira] [Commented] (BEAM-1556) Spark executors need to register IO factories
Date Tue, 28 Feb 2017 17:28:45 GMT


Davor Bonaci commented on BEAM-1556:

Ack the higher point -- this is an SDK specific-requirement.

I think it would make sense if this is done by the SDK _if_ we had init & cleanup methods
at various scopes (per worker, JVM, process, thread, task, step, etc.). The runner would then
call the appropriately scoped init method and the SDK can do this (trivially). However, this
is something we need for other reasons too, but don't have today. Today, the SDK could call
it perhaps before every step (every DoFn, source, etc.), which seems like an overkill.

To unblock the user scenario, I'd be in favor of the approach that the runner calls this now
on a per-task basis, with the clear understanding this has to be cleaned up in the future.

> Spark executors need to register IO factories
> ---------------------------------------------
>                 Key: BEAM-1556
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-spark
>            Reporter: Frances Perry
>            Assignee: Jean-Baptiste Onofré
> The Spark executors need to call IOChannelUtils.registerIOFactories(options) in order
to support GCS file and make the default WordCount example work.
> Context in this thread:

This message was sent by Atlassian JIRA

View raw message