samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommy Becker <>
Subject Long changelog restoration times?
Date Thu, 10 Mar 2016 15:42:40 GMT
We had one of our Samza jobs restart overnight recently and noticed that restoration from the
changelog took much longer than I would expect (well over an hour). Looking through the logs,
the throughput initially seems reasonable if not stellar. But nearly every container seems
to encounter one or more long pauses during the restoration process:

2016-03-09 03:50:09,940 (default) [main] INFO  []
 - 56000000 entries restored...
2016-03-09 03:50:19,895 (default) [main] INFO  []
 - 57000000 entries restored...
2016-03-09 03:51:41,310 (default) [main] INFO  []
 - 58000000 entries restored...
2016-03-09 04:22:13,003 (default) [main] INFO  []
 - 59000000 entries restored...

Here we see a nearly 30 minute span with no logs. So far as we can tell, Kafka is healthy
during this period and other containers are making progress restoring their partitions around
this time, so the "gaps" are not happening at the same time across containers. We are running
Samza 0.9.1 on a YARN cluster in AWS so some variance in performance is to be expected, but
this seems pretty extreme. Is anyone else seeing this behavior?

Tommy Becker
Senior Software Engineer

A TiVo Company<><>


This email and any attachments may contain confidential and privileged material for the sole
use of the intended recipient. Any review, copying, or distribution of this email (or any
attachments) by others is prohibited. If you are not the intended recipient, please contact
the sender immediately and permanently delete this email and any attachments. No employee
or agent of TiVo Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc.
by email. Binding agreements with TiVo Inc. may only be made by a signed written agreement.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message