flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darshan Singh <darshan.m...@gmail.com>
Subject Re: Need to understand the execution model of the Flink
Date Mon, 19 Feb 2018 01:19:42 GMT
Thanks for reply.

I guess I am not looking for alternate. I am trying to understand what
flink does in this scenario and if 10 tasks ar egoing in parallel I am sure
they will be reading csv as there is no other way.

Thanks

On Mon, Feb 19, 2018 at 12:48 AM, Niclas Hedhman <niclas@hedhman.org> wrote:

>
> Do you really need the large single table created in step 2?
>
> If not, what you typically do is that the Csv source first do the common
> transformations. Then depending on whether the 10 outputs have different
> processing paths or not, you either do a split() to do individual
> processing depending on some criteria, or you just have the sink put each
> record in separate tables.
> You have full control, at each step along the transformation path whether
> it can be parallelized or not, and if there are no sequential constraints
> on your model, then you can easily fill all cores on all hosts quite easily.
>
> Even if you need the step 2 table, I would still just treat that as a
> split(), a branch ending in a Sink that does the storage there. No need to
> read records from file over and over again, nor to store them first in step
> 2 table and read them out again.
>
> Don't ask *me* about what happens in failure scenarios... I have myself
> not figured that out yet.
>
> HTH
> Niclas
>
> On Mon, Feb 19, 2018 at 3:11 AM, Darshan Singh <darshan.meel@gmail.com>
> wrote:
>
>> Hi I would like to understand the execution model.
>>
>> 1. I have a csv files which is say 10 GB.
>> 2. I created a table from this file.
>>
>> 3. Now I have created filtered tables on this say 10 of these.
>> 4. Now I created a writetosink for all these 10 filtered tables.
>>
>> Now my question is that are these 10 filetered tables be written in
>> parallel (suppose i have 40 cores and set up parallelism to say 40 as well.
>>
>> Next question I have is that the table which I created form the csv file
>> which is common wont be persisted by flink internally rather for all 10
>> filtered tables it will read csv files and then apply the filter and write
>> to sink.
>>
>> I think that for all 10 filtered tables it will read csv again and again
>> in this case it will be read 10 times.  Is my understanding correct or I am
>> missing something.
>>
>> What if I step 2 I change table to dataset and back?
>>
>> Thanks
>>
>
>
>
> --
> Niclas Hedhman, Software Developer
> http://polygene.apache.org - New Energy for Java
>

Mime
View raw message