flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangzhijiang999" <wangzhijiang...@aliyun.com>
Subject 答复:thread model issue in TaskManager
Date Mon, 03 Aug 2015 02:12:52 GMT
Hi Stephan,Fabian
       Thank you for your reply!  I will run the flink on yarn actually . It is feasible
to isolate different tasks in one job by starting new yarn session. And it means every job
will have a yarn seesion, and one taskManager just has one slot. If I want to run all jobs
in one yarn cluster in pipelined mode, and one taskManager can run many tasks, another way
is to use process mode, that means every task will be a process not thread, so isolation is
natural. Do you think it is feasible to modify flink runtime to realize this? Or if we want
to do that, are there any suggestions?  Thank you!
------------------------------------------------------------------发件人:Stephan Ewen
<sewen@apache.org>发送时间:2015年8月3日(星期一) 00:36收件人:user <user@flink.apache.org>抄 送:wangzhijiang999
<wangzhijiang999@aliyun.com>主 题:Re: thread model issue in TaskManagerHere are
some additional things you can do:  - For isolation between parallel tasks (within a job),
start your YARN job such that each TaskManager has one slot, and start many TaskManagers.
That is a bit less efficient (but not much) than fewer TaskManagers with more slots. (*) 
- If you need to isolate successor tasks in a job against predecessor tasks, you can select
"batch" execution mode. By default, the system uses "pipelined" execution mode. In a MapReduce
case, this means that mappers and reducers run concurrently. With "batch" mode, reducers run
only after all mappers finished.Greetings,Stephan(*) The reason why multiple slots in one
TaskManager are more efficient is that TaskManagers multiplex multiple data exchanges of a
shuffle through a TCP connection, reducing per-exchange overhead and usually increasing throughput.As
Fabian suggested, YARN is a good way to go for isolation (it actually isolates more than a
JVM, which is very nice).On Thu, Jul 30, 2015 at 12:10 PM, Fabian Hueske <fhueske@gmail.com>
wrote:Hi,it is currently not possible to isolate tasks that consume a lot of JVM heap memory
and schedule them to a specific slot (or TaskManager).If you operate in a YARN setup, you
can isolate different jobs from each other by starting a new YARN session for each job, but
tasks within the same job cannot be isolated from each other right now.Cheers, Fabian2015-07-30
4:02 GMT+02:00 wangzhijiang999 <wangzhijiang999@aliyun.com>:As I know, flink uses thread
model in TaskManager, that means one taskmanager process may run many different operator threads,and
these threads will compete the memory of the process. I know that flink has memoryManage component
in each taskManager, and it will control the localBufferPool of InputGate, ResultPartition
for each task,but if UDF consume much memory, it will use jvm heap memory, so it can not
be controlled by flink. If I use flink as common platform, some users will consume much memory
in UDF, and it may influence other threads in the process, especially for OOM.  I know that
it has sharedslot or isolated slot properties , but it just limit the task schedule in one
taskmanager, can i schedule task in separate taskmanger if i consume much memory and donot
want to influence other tasks. Or are there any suggestions for the issue of thread model.
As I know spark is also thread model, but hadoop2 use process model.Thank you for any suggestions
in advance!

Mime
View raw message