kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manikumar Reddy <ku...@nmsworks.co.in>
Subject Re: Log compaction not working as expected
Date Tue, 16 Jun 2015 14:17:12 GMT
Ok..I got your point. Currently we check the log segment constraints
(segment.bytes, segment.ms)
only before appending new messages. So we will not create a new log segment
until new data comes.

In your case, your approach(sending periodic dummy/ping message) should be
fine.



On Tue, Jun 16, 2015 at 7:19 PM, Shayne S <shaynest113@gmail.com> wrote:

> Thank you for the response!
>
> Unfortunately, those improvements would not help.  It is the lack of
> activity resulting in a new segment that prevents compaction.
>
> I was confused by what qualifies as the active segment. The active segment
> is the last segment as opposed to the segment that would be written to if
> something were received right now.
>
> On Tue, Jun 16, 2015 at 8:38 AM, Manikumar Reddy <kumar@nmsworks.co.in>
> wrote:
>
> > Hi,
> >
> >   Your observation is correct.  we never compact the active segment.
> >   Some improvements are proposed here,
> >   https://issues.apache.org/jira/browse/KAFKA-1981
> >
> >
> > Manikumar
> >
> > On Tue, Jun 16, 2015 at 5:35 PM, Shayne S <shaynest113@gmail.com> wrote:
> >
> > > Some further information, and is this a bug?  I'm using 0.8.2.1.
> > >
> > > Log compaction will only occur on the non active segments.  Intentional
> > or
> > > not, it seems that the last segment is always the active segment.  In
> > other
> > > words, an expired segment will not be cleaned until a new segment has
> > been
> > > created.
> > >
> > > As a result, a log won't be compacted until new data comes in (per
> > > partition). Does this mean I need to send the equivalent of a pig (
> > > https://en.wikipedia.org/wiki/Pigging) through each partition in order
> > to
> > > force compaction?  Or can I force the cleaning somehow?
> > >
> > > Here are the steps to recreate:
> > >
> > > 1. Create a new topic with a 5 minute segment.ms:
> > >
> > > kafka-topics.sh --zookeeper localhost:2181 --create --topic TEST_TOPIC
> > > --replication-factor 1 --partitions 1 --config cleanup.policy=compact
> > > --config min.cleanable.dirty.ratio=0.01 --config segment.ms=300000
> > >
> > > 2. Repeatedly add messages with identical keys (3x):
> > >
> > > echo "ABC123,{\"test\": 1}" | kafka-console-producer.sh --broker-list
> > > localhost:9092 --topic TEST_TOPIC --property parse.key=true --property
> > > key.separator=, --new-producer
> > >
> > > 3. Wait 5+ minutes and confirm no log compaction.
> > > 4. Once satisfied, send a new message:
> > >
> > > echo "DEF456,{\"test\": 1}" | kafka-console-producer.sh --broker-list
> > > localhost:9092 --topic TEST_TOPIC --property parse.key=true --property
> > > key.separator=, --new-producer
> > >
> > > 5. Log compaction will occur quickly soon after.
> > >
> > > Is my use case of infrequent logs not supported? Is this intentional
> > > behavior? It's unnecessarily challenging to target each partition with
> a
> > > dummy message to trigger compaction.
> > >
> > > Also, I believe there is another issue with logs originally configured
> > > without a segment timeout that lead to my original issue.  I still
> cannot
> > > get those logs to compact.
> > >
> > > Thanks!
> > > Shayne
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message