samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: question on yarn.container.cpu.cores
Date Thu, 19 Nov 2015 17:07:56 GMT
Hi Chen,

Yes, theoretically, it's possible and it'll certainly increase parallelism.

You should be careful about auto-commit when you implement this, Say you
have auto-commit turned on, and do a time-consuming operation for certain
types of messages in your process method (like - a long computation/ a
remote call), and return from process immediately after you submit your
message to your threadpool. If Samza's auto-commit kicks in, and commits
the message offset, and the fails before your threadpool completes, then,
you've lost that submitted message to the threadpool (Since, after your
restart jobs resume from the checkpoint).

You should turn auto-commit off and call task.commit yourself when all your
threadpool tasks complete. (maybe, in window method) That way, you'll
achieve parallelism. However, you may shoot yourself in the foot if not
implemented right ;-)




On Wed, Nov 18, 2015 at 8:40 PM, Chen Song <chen.song.82@gmail.com> wrote:

> Thanks Navina
>
> So theoretically I can create a thread pool within a container. I know it
> is very hacky but it should increase parallelism.
>
> Chen
>
> On Mon, Nov 16, 2015 at 5:49 PM, Navina Ramesh <nramesh@linkedin.com>
> wrote:
>
> > Hi Chen,
> > Samza container is still single threaded. In case of yarn based
> deployment,
> > Samza uses this config value to verify that the cluster has sufficient
> > capacity to support running your job.
> >
> > Apart from this verification, I don't believe we utilize this config
> value.
> > If you set it to > 1, it won't have any effect on the Samza job execution
> > itself. However, you may end-up under-utilizing your Yarn cluster
> > resources.
> >
> > HTH!
> > Navina
> >
> > On Mon, Nov 16, 2015 at 2:32 PM, Chen Song <chen.song.82@gmail.com>
> wrote:
> >
> > > According to the documentation, each Samza container is single
> threaded.
> > > Why giving yarn.container.cpu.cores as a config option and what is the
> > > implication
> > > to set this to a value > 1?
> > >
> > > --
> > > Chen Song
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>
>
>
> --
> Chen Song
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message