flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vinoyang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-11818) Provide pipe transformation function for DataSet API
Date Fri, 08 Mar 2019 10:31:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-11818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16787763#comment-16787763

vinoyang commented on FLINK-11818:

Hi [~hequn8128] , In fact, my idea is not much different from the current implementation
of Spark.

1) We can provide multiple overloaded methods called pipe for the DataSet object. E.g, p{{ipe(String
cmd)/pipe(String cmd, Map<String, String> env)...}},  Flink inputs the external program
and gets the output of the external program as a new DataSet. [1]  [2]

2) I think its semantics are similar to Spark.


[1]: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala]

[2]: [https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala]


What do you think? cc [~fhueske] [~till.rohrmann]


> Provide pipe transformation function for DataSet API
> ----------------------------------------------------
>                 Key: FLINK-11818
>                 URL: https://issues.apache.org/jira/browse/FLINK-11818
>             Project: Flink
>          Issue Type: Improvement
>          Components: API / DataSet
>            Reporter: vinoyang
>            Assignee: vinoyang
>            Priority: Major
> We have some business requirements that require the data handled by Flink to interact
with some external programs (such as Python/Perl/shell scripts). There is no such function
in the existing DataSet API, although it can be implemented by the map function, but it is
not concise. It would be helpful if we could provide a pipe[1] function like Spark.
> [1]: https://spark.apache.org/docs/latest/rdd-programming-guide.html#transformations

This message was sent by Atlassian JIRA

View raw message