beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2516) User reports 4 minutes to process 1 million line CSV in DirectRunner
Date Tue, 12 Sep 2017 02:45:03 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162420#comment-16162420
] 

Kenneth Knowles commented on BEAM-2516:
---------------------------------------

 The actual round-trip is behind a flag, but that isn't actually the piece that causes this.
It is undoubtedly going through the portability APIs to snag out the DoFns, etc. I will see
about hacking something up in time for 2.2.0. I'll first get a solid view of the profile in
case it is just low-hanging fruit. If nontrivial, I'll do a bit more to put portability behind
a feature flag.

> User reports 4 minutes to process 1 million line CSV in DirectRunner
> --------------------------------------------------------------------
>
>                 Key: BEAM-2516
>                 URL: https://issues.apache.org/jira/browse/BEAM-2516
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-direct
>            Reporter: Kenneth Knowles
>            Assignee: Kenneth Knowles
>            Priority: Minor
>             Fix For: 2.2.0
>
>
> https://stackoverflow.com/questions/44736414/simple-apache-beam-manipulations-work-very-slow
> I don't know what the expectation are here, so I wasn't ready to say this is WAI. Low
priority since it isn't what the runner is for anyhow, but this seems like the scale of data
that should be snappy. Worth investigating, or maybe you can quickly indicate why it is expected?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message