samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Pan <nickpa...@gmail.com>
Subject Re: Multithreading ThreadJobFactory
Date Wed, 28 Oct 2015 18:30:59 GMT
Hi, Lukas,

Sorry to reply late. My comments below

On Tue, Oct 20, 2015 at 7:14 AM, Lukas Steiblys <lukas@doubledutch.me>
wrote:

>
> >> I have been thinking lately about the most non-invasive way to add
> >> multithreading capabilities to ThreadJobFactory, as that is the main
> >> method
> >> we run our jobs in production. Looking at the master branch code in Git,
> >> I
> >> have found the following:
> >>   a.. The best way would be to simply spin up a new thread for each
> >> container.
> >>   b.. The number of containers can already be specified using the
> >> configuration property job.container.count.
> >>   c.. I can construct a new SamzaContainer for each containerModel
> >> returned from coordinator.jobModel.getContainers in ThreadJobFactory.
>

I like this idea. So I guess this would be a simplified version of
parallelized Samza jobs within one JVM. It could be useful for users that
need multiple parallel threads but does not need to scale the job across
multiple machines. We do have a plan to work on a similar multi-thread
model in a distributed environment in LinkedIn. The main difference here
would be the coordinator would need to have a distributed version of
implementation to include partition assignment, checkpoint, changelog
initialization. It would be ideal if the coordinator's API can stay the
same in both single node mode and distributed mode. I would be able to
comment more when we have a sketchy design doc.


> >>   d.. I can pass a list of these containers into ThreadJob constructor
> >> modifying it to accept an array of Runnables.
> >>   e.. For each runnable, it would create a new thread and start it in
> the
> >> submit method of ThreadJob.
> >> This should start up a new thread for each container and group the tasks
> >> using the appropriate TaskNameGrouper.
>

Wouldn't it make more sense to have the TaskNameGrouper be part of the
coordinator logic as well? And each container will just take a
ContainerModel to start. That way, the container logic would be much more
simplified.


> >>
> >> Any ideas on what I might have missed? Are there any other potential
> >> solutions? Would this be a good patch for Samza in general?
> >>
> >> Lukas
> >>
> >
>

Definitely worth to push forward in that direction. This is aligned w/ our
effort to build a standalone execution model of Samza.

Thanks a lot for putting so much thoughts in this direction.

-Yi

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message