flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darshan Singh <darshan.m...@gmail.com>
Subject Need to understand the execution model of the Flink
Date Sun, 18 Feb 2018 19:11:33 GMT
Hi I would like to understand the execution model.

1. I have a csv files which is say 10 GB.
2. I created a table from this file.

3. Now I have created filtered tables on this say 10 of these.
4. Now I created a writetosink for all these 10 filtered tables.

Now my question is that are these 10 filetered tables be written in
parallel (suppose i have 40 cores and set up parallelism to say 40 as well.

Next question I have is that the table which I created form the csv file
which is common wont be persisted by flink internally rather for all 10
filtered tables it will read csv files and then apply the filter and write
to sink.

I think that for all 10 filtered tables it will read csv again and again in
this case it will be read 10 times.  Is my understanding correct or I am
missing something.

What if I step 2 I change table to dataset and back?


View raw message