kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Otto <o...@wikimedia.org>
Subject Re: Disk space - sharp increase in usage
Date Tue, 02 Jun 2020 13:00:51 GMT
WMF recently had an issue
<https://phabricator.wikimedia.org/T250133#6063641> where Kafka broker
disks were filling up with log segment data.  It turned out that Kafka was
not deleting old log segments because the oldest log segment had a message
with a Kafka timestamp a year in the future.  Since the oldest log segment
had a message newer than any others, Kafka could not respect the
retention.ms setting and delete any old log segments.  We mitigated this by
setting retention.bytes, which overrode retention.ms and allowed Kafka to
prune old logs.  For us, this could be prevented from happening again by
setting message.timestamp.difference.max.ms.

Not sure if this is your problem, but it is at least something to check!
 :)

On Tue, Jun 2, 2020 at 6:26 AM Liam Clarke-Hutchinson <
liam.clarke@adscale.co.nz> wrote:

> Hi Victoria,
>
> There are no metrics of when a config was changed. However, if you've been
> capturing the JMX metrics from the brokers, the metric
> kafka.cluster:name=ReplicasCount,partition=*,topic=*,type=Partition
> would show if replication factor was increased.
>
> As for retention time, if you're sure that there's not been an increase in
> data ingestion, best metric to look into for that is
> kafka.log:name=LogSegments... as an increase in that would either be caused
> by a large influx of data or an increase in retention time.
>
> Lastly, check the logs and metrics for the log cleaner, in case there's any
> issues occurring preventing logs from being cleaned.
> kafka.log:name=max-clean-time-secs,type=LogCleaner
> and kafka.log:name=time-since-last-run-ms,type=LogCleanerManager would be
> most useful here.
>
> The ZK logs won't be much use (ZK being where the config is stored) unless
> you had audit logging enabled, which is disabled by default.
>
> Good luck,
>
> Liam Clarke-Hutchinson
>
>
> On Tue, 2 Jun. 2020, 8:50 pm Victoria Zuberman, <
> victoria.zuberman@imperva.com> wrote:
>
> > Regards kafka-logs directory, it was an interesting lead, we checked and
> > it is the same.
> >
> > Regards replication factor and retention, I am not looking for current
> > information, I am look for metrics that can give me information about a
> > change.
> >
> > Still looking for more ideas
> >
> > On 02/06/2020, 11:31, "Peter Bukowinski" <pmbuko@gmail.com> wrote:
> >
> >     CAUTION: This message was sent from outside the company. Do not click
> > links or open attachments unless you recognize the sender and know the
> > content is safe.
> >
> >
> >     > On Jun 2, 2020, at 12:56 AM, Victoria Zuberman <
> > victoria.zuberman@imperva.com> wrote:
> >     >
> >     > Hi,
> >     >
> >     > Background:
> >     > Kafka cluster
> >     > 7 brokers, with 4T disk each
> >     > version 2.3 (recently upgraded from 0.1.0 via 1.0.1)
> >     >
> >     > Problem:
> >     > Used disk space went from 40% to 80%.
> >     > Looking for root cause.
> >     >
> >     > Suspects:
> >     >
> >     >  1.  Incoming traffic
> >     >
> >     > Ruled out, according to metrics no significant change in “bytes in”
> > for topics in cluster
> >     >
> >     >  1.  Upgrade
> >     >
> >     > The raise started on the day of upgrade to 2.3
> >     >
> >     > But we upgraded another cluster in the same way and we don’t see
> > similar issue there
> >     >
> >     > Is there a known change or issue at 2.3 related to disk space
> usage?
> >     >
> >     >  1.  Replication factor
> >     >
> >     > Is there a way to see whether replication factor of any topic was
> > changed recently? Didn’t find in metrics...
> >
> >     You can use the kafka-topics.sh script to check the replica count for
> > all your topics. Upgrading would not have affected the replica count,
> > though.
> >
> >     >  1.  Retention
> >     >
> >     > Is there a way to see whether retention was changed recently?
> Didn’t
> > find in metrics...
> >
> >     You can use  kafka-topics.sh —-zookeeper host:2181 --describe
> > --topics-with-overrides
> >     to list any topics with non-default retention, but I’m guessing
> that’s
> > not it.
> >
> >     If your disk usage went from 40 to 80% on all brokers — effectively
> > doubled — it could be that your kafka data log directory path(s) changed
> > during the upgrade. As you upgraded each broker and (re)started the
> kafka,
> > it would have left the existing data under the old one path and created
> new
> > topic partition directories and logs under the new path as it rejoined
> the
> > cluster. Have you verified that your data log directory locations are the
> > same as they used to be?
> >
> >     > Would appreciate any other ideas or investigation leads
> >     >
> >     > Thanks,
> >     > Victoria
> >     >
> >     > -------------------------------------------
> >     > NOTICE:
> >     > This email and all attachments are confidential, may be
> proprietary,
> > and may be privileged or otherwise protected from disclosure. They are
> > intended solely for the individual or entity to whom the email is
> > addressed. However, mistakes sometimes happen in addressing emails. If
> you
> > believe that you are not an intended recipient, please stop reading
> > immediately. Do not copy, forward, or rely on the contents in any way.
> > Notify the sender and/or Imperva, Inc. by telephone at +1 (650) 832-6006
> > and then delete or destroy any copy of this email and its attachments.
> The
> > sender reserves and asserts all rights to confidentiality, as well as any
> > privileges that may apply. Any disclosure, copying, distribution or
> action
> > taken or omitted to be taken by an unintended recipient in reliance on
> this
> > message is prohibited and may be unlawful.
> >     > Please consider the environment before printing this email.
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message