samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kartik Paramasivam <kparamasi...@linkedin.com.INVALID>
Subject Re: Multithreading ThreadJobFactory
Date Tue, 20 Oct 2015 07:28:14 GMT
We have been wanting to do something similar at LinkedIn.  We however
haven't thought through the details.

if container == thread.. then we would need to change the AppMaster to
request the appropriate number of Yarn 'containers' (processes) .. i.e. we
would have to decouple the process count from the yarn.Containers.Count ..

Basically wouldn't we have to come up with a new setting Yarn.ProcessCount
?

On Mon, Oct 19, 2015 at 3:49 PM, Lukas Steiblys <lukas@doubledutch.me>
wrote:

> I have been thinking lately about the most non-invasive way to add
> multithreading capabilities to ThreadJobFactory, as that is the main method
> we run our jobs in production. Looking at the master branch code in Git, I
> have found the following:
>   a.. The best way would be to simply spin up a new thread for each
> container.
>   b.. The number of containers can already be specified using the
> configuration property job.container.count.
>   c.. I can construct a new SamzaContainer for each containerModel
> returned from coordinator.jobModel.getContainers in ThreadJobFactory.
>   d.. I can pass a list of these containers into ThreadJob constructor
> modifying it to accept an array of Runnables.
>   e.. For each runnable, it would create a new thread and start it in the
> submit method of ThreadJob.
> This should start up a new thread for each container and group the tasks
> using the appropriate TaskNameGrouper.
>
> Any ideas on what I might have missed? Are there any other potential
> solutions? Would this be a good patch for Samza in general?
>
> Lukas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message