flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-2239) print() on DataSet: stream results and print incrementally
Date Mon, 16 Nov 2015 13:44:15 GMT

     [ https://issues.apache.org/jira/browse/FLINK-2239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Fabian Hueske updated FLINK-2239:
    Fix Version/s:     (was: 0.10.0)

> print() on DataSet: stream results and print incrementally
> ----------------------------------------------------------
>                 Key: FLINK-2239
>                 URL: https://issues.apache.org/jira/browse/FLINK-2239
>             Project: Flink
>          Issue Type: Improvement
>          Components: Distributed Runtime
>    Affects Versions: 0.9
>            Reporter: Maximilian Michels
>             Fix For: 1.0.0
> Users find it counter-intuitive that {{print()}} on a DataSet internally calls {{collect()}}
and fully materializes the set. This leads to out of memory errors on the client. It also
leaves users with the feeling that Flink cannot handle large amount of data and that it fails
> To improve on this situation requires some major architectural changes in Flink. The
easiest solution would probably be to transfer the data from the job manager to the client
via the {{BlobManager}}. Alternatively, the client could directly connect to the task managers
and fetch the results. 

This message was sent by Atlassian JIRA

View raw message