samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <>
Subject Re: Question about JobFactories
Date Wed, 07 Dec 2016 20:20:19 GMT
Hi Nicolas,

Samza on Yarn (or rather running multiple containers) provides you with

1. *Fault tolerance:* When a machine fails, the YarnJob implementation
re-starts your tasks / containers in another machine.
2. *Parallelism:* With multiple containers, you can partition your work
among them.

If your requirements don't really care about the above, or you have other
ways to ensure container liveness using external frameworks (like
Kubernetes or Mesos on marathon), then please feel free to use
ProcessJobFactory or ThreadJobFactory.

Here's a Netflix talk
one of our community meetups on how they were running Samza in production
without Yarn.

We'd also love to have you contribute back your success in running Samza on
other environments.


On Wed, Dec 7, 2016 at 2:43 AM, Nicolas Colomer <>

> Hello community,
> As mentioned in the documentation
> <
> jobs/configuration-table.html>,
> the *job.factory.class* configuration has 3 possible values:
> *ThreadJobFactory*, *ProcessJobFactory*and *YarnJobFactory*. For the first
> two, we can read in the description: "This is intended only for
> development, not for production deployments.".
> It may be a dumb question but can you elaborate on the reasons of this?
> For instances, we have some Samza jobs that use only one task and are
> enough to sustain production traffic. We wonder why we could not run those
> jobs in a standalone mode (over Mesos for instance) instead of relying to a
> YARN cluster.
> Thanks for your answers.
> Nicolas

Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message