samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@apache.org>
Subject Re: Question on nullEnvelop
Date Fri, 06 Feb 2015 20:39:21 GMT
Hey Jae,

> If so, what's the best way to shutdown the container without using command
topic?

YARN does send a SIGTERM before SIGKILL. The config in YARN to set the
latency is here:

  yarn.nodemanager.sleep-delay-before-sigkill.ms

The default is 250ms. Samza does *not* currently handle the SIGTERM
gracefully (it doesn't shut itself down). The ticket to do this is here:

  https://issues.apache.org/jira/browse/SAMZA-506

If you'd like to work on that patch, that should make it work. If not, yes,
you'll have to use some form of a shutdown command. Zach (the guy who
opened the JIRA) was able to hack around this himself by adding a shutdown
hook. You could do something similar, if you want: add a shutdown hook that
sets a variable, have window() check the variable ever N ms, and call
coordinator.shutdown if it's set to true. You'd probably also have to raise
the delay to more than 250ms in YARN.

Options:

1. Use a topic like samza_command.
2. Fix SAMZA-506.
3. Write a custom shutdown hook with a static variable.

>  Does it hurt overall processing performance? I don't think so, but I
want to confirm.

Nope, shouldn't. It only sleeps during "idle" time (no messages available).
When there are messages available, you shouldn't get null_envelopes (unless
you have a custom MessageChooser that withholds available messages, which I
doubt you do).

Cheers,
Chris

On Fri, Feb 6, 2015 at 12:30 PM, Bae, Jae Hyeon <metacret@gmail.com> wrote:

> What I am doing is, consuming two topics, samza_input and samza_command.
> samza_command will have some control command something like "shutdown,all"
> because kill-yarn-job.sh does not gracefully shutdown SamzaContainer. Am I
> correct? If so, what's the best way to shutdown the container without using
> command topic?
>
> 10ms explains why 50 null envelops were consumed per second. Does it hurt
> overall processing performance? I don't think so, but I want to confirm.
>
> Thank you
> Best, Jae
>
> On Fri, Feb 6, 2015 at 12:16 PM, Chris Riccomini <criccomini@apache.org>
> wrote:
>
> > Hey Jae,
> >
> > SamzaContainer polls for new messages by calling
> > consumerMultiplexer.choose. In a case where there are no messages
> > available, choose will return null. The next time choose is called, it
> will
> > be invoked with a timeout (the default is 10ms). This time, the poll call
> > will block until 1) the timeout is hit 2) there is a new message
> available
> > to process. This is to prevent a tight loop.
> >
> > > its frequency is too high, in my testing environment, it's more than 50
> > per second.
> >
> > Why do you think this is too high? It either has to do this, or sleep for
> > longer. The longer the container sleeps, the more latency that's
> introduced
> > when there *is* a message available. 10ms is what we use by default.
> >
> > Cheers,
> > Chris
> >
> > On Fri, Feb 6, 2015 at 11:11 AM, Bae, Jae Hyeon <metacret@gmail.com>
> > wrote:
> >
> > > Could you explain why consumerMultiplexer.choose returns null?
> > >
> > > Can it happen when there's no message in the kafka topic?
> > >
> > > If my theory is correct, its frequency is too high, in my testing
> > > environment, it's more than 50 per second.
> > >
> > > Thank you
> > > Best, Jae
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message