beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pablo Estrada (JIRA)" <>
Subject [jira] [Commented] (BEAM-1442) Performance improvement of the Python DirectRunner
Date Wed, 22 Feb 2017 19:31:44 GMT


Pablo Estrada commented on BEAM-1442:

For a proposal you should include 
(1) Introduction - Introduce the project
(2) Goals, 
(3) Implementation - of a benchmark and the runner improvements.  Be as specific and detailed
as possible. This project is not easy and we need to see that you have a good grasp of the
different components.
(4) Timeline,
(5) Self-introduction - Introduce yourself too.

> Performance improvement of the Python DirectRunner
> --------------------------------------------------
>                 Key: BEAM-1442
>                 URL:
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py
>            Reporter: Pablo Estrada
>            Assignee: Ahmet Altay
>              Labels: gsoc2017, mentor, python
> The DirectRunner for Python and Java are intended to act as policy enforcers, and correctness
checkers for Beam pipelines; but there are users that run data processing tasks in them.
> Currently, the Python Direct Runner has less-than-great performance, although some work
has gone into improving it. There are more opportunities for improvement.
> Skills for this project:
> * Python
> * Cython (nice to have)
> * Working through the Beam getting started materials (nice to have)
> To start figuring out this problem, it is advisable to run a simple pipeline, and study
the `` and `` methods. Ask questions directly on JIRA.

This message was sent by Atlassian JIRA

View raw message