samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shanthoosh Venkataraman <santhoshvenkat1...@gmail.com>
Subject Re: Review Request 52476: Do not load task store which are older than delete tombstones.
Date Fri, 14 Oct 2016 00:13:56 GMT


> On Oct. 5, 2016, 2:08 a.m., Jake Maes wrote:
> > Looks better, but I think there's still one major part missing. 
> > 
> > In order to have agreement between a kafka changelog and the task storage, the changelog
should be created with the same delete.retention.ms property. 
> > 
> > There are 2 ways to do this:
> > 1. (preferred) update the kafka system admin to read the samza changelog property
that you've defined (which also needs to be added to the config table, btw) and create the
topic with that value for delete.retention.ms
> > 2. Rename the property so it's one of the "topic-level-property" so it gets automatically
passed to kafka. This is convenient but wouldn't apply to other systems, which could be useful
if those other systems have a delete retention policy.

I think 1) is the only plausible way to accomplish this through job config. delete.retention.ms
configuration is associated only with stores changelog, not applicable to topics in general,
so making it topic level property might notbe a good idea. Enforcing the delete.retention.ms
property is harder to accomplish through config, since kafka is a external system. Ideally,
if there's a way to fetch kafka metadata/config(delete.retention.ms) about a topic, during
container startups we could fetch that value, rather than expecting the users to specify it.


- Shanthoosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52476/#review151439
-----------------------------------------------------------


On Oct. 4, 2016, 11:33 p.m., Shanthoosh Venkataraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52476/
> -----------------------------------------------------------
> 
> (Updated Oct. 4, 2016, 11:33 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Every local task store is backed up by a kafka changelog topic. Due to log compaction,
delete tombstones of the changelog topic have a ttl of delete.retention.ms. Replaying the
events from the changelog that has missing delete tombstones, would result in creation of
an inconsistent local store(due to the missing of some delete events). This patch deletes
the local stores in which difference between current time and last modified time of the offset
file is greater than delete.retention.ms during the container startup.
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java 9329edf7d724f3a0d9235354bb77936f713e3b5f

>   samza-core/src/main/scala/org/apache/samza/config/StorageConfig.scala be3f1068f7921c9c8f8697577efea6580672813e

>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 05a996c98075ea8ed3767af666b9beeb1933f2a6

>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 0b7bcdda1639eea8239a69c31bdf42558e9077d2

>   samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala 4d40f520e54beb643acd8410c772b75e2f6a9162

> 
> Diff: https://reviews.apache.org/r/52476/diff/
> 
> 
> Testing
> -------
> 
> Unit testing and manual testing has been done to verify the functionality.
> 
> 
> Thanks,
> 
> Shanthoosh Venkataraman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message