samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Steiblys <>
Subject Re: Multithreading ThreadJobFactory
Date Tue, 20 Oct 2015 14:14:09 GMT

This is for the case when you don't use YARN. ThreadJob runs locally
and simply spins up a single thread for all tasks right now.


On 10/20/15, Kartik Paramasivam <> wrote:
> We have been wanting to do something similar at LinkedIn.  We however
> haven't thought through the details.
> if container == thread.. then we would need to change the AppMaster to
> request the appropriate number of Yarn 'containers' (processes) .. i.e. we
> would have to decouple the process count from the yarn.Containers.Count ..
> Basically wouldn't we have to come up with a new setting Yarn.ProcessCount
> ?
> On Mon, Oct 19, 2015 at 3:49 PM, Lukas Steiblys <>
> wrote:
>> I have been thinking lately about the most non-invasive way to add
>> multithreading capabilities to ThreadJobFactory, as that is the main
>> method
>> we run our jobs in production. Looking at the master branch code in Git,
>> I
>> have found the following:
>>   a.. The best way would be to simply spin up a new thread for each
>> container.
>>   b.. The number of containers can already be specified using the
>> configuration property job.container.count.
>>   c.. I can construct a new SamzaContainer for each containerModel
>> returned from coordinator.jobModel.getContainers in ThreadJobFactory.
>>   d.. I can pass a list of these containers into ThreadJob constructor
>> modifying it to accept an array of Runnables.
>>   e.. For each runnable, it would create a new thread and start it in the
>> submit method of ThreadJob.
>> This should start up a new thread for each container and group the tasks
>> using the appropriate TaskNameGrouper.
>> Any ideas on what I might have missed? Are there any other potential
>> solutions? Would this be a good patch for Samza in general?
>> Lukas

View raw message