beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephen Sisk (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-1323) Add parallelism/splitting in JdbcIO
Date Mon, 12 Jun 2017 17:40:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16046824#comment-16046824
] 

Stephen Sisk commented on BEAM-1323:
------------------------------------

can you elaborate on what exactly the SplittingFn would do? What parameters does it take,
what does it return, how is that used by jdbcio and what are databases where we think it would
work well?

> Add parallelism/splitting in JdbcIO
> -----------------------------------
>
>                 Key: BEAM-1323
>                 URL: https://issues.apache.org/jira/browse/BEAM-1323
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Jean-Baptiste Onofré
>            Assignee: Jean-Baptiste Onofré
>
> Now, the JDBC IO is basically a {{DoFn}} executed with a {{ParDo}}. So, it means that
parallelism is "limited" and executed on one executor.
> We can imagine to create several JDBC {{BoundedSource}}s splitting the SQL query in 
subset (for instance using row id paging or any "splitting/limit" we can figure based on the
original SQL query) (something similar to what Sqoop is doing).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message