I have been reading the Twitter Heron paper and i was a bit surprised concerning the allocation criticism therein, based on an example.
"... consider scheduling 3 spouts and 1 bolt on 2 workers. Assuming that the bolt and the spout tasks each need 10GB and 5GB of memory respectively, this topology needs to reserve a total of 15GB memory per worker since the worker has to run a bolt and a spout task. This allocation policy leads to a total of 30GB of memory for the topology, while only 25GB is actually required ...".
Please correct me if the following is wrong: Assuming each worker runs on a separate machine and that a worker requires a maximum of 15 GB, both workers need to be allocated 15GB using the default scheduler, since we do not know which machine will contain the single bolt and one of the spout tasks.
With the ResouceAwareScheduler we can specify memory requirements per component within the topology. How does this influence the memory allocated to the worker process however? If one of the worker processes is configured with 10 GB of memory, would the RAS deploy the topology such that the worker process with less memory would receive the 2 spouts? I presume that is exactly what it is meant to do...
Thanks in advance.