samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jake Maes <jma...@apache.org>
Subject Re: Review Request 52492: Adding monitor to clean up stale local stores of jobs/tasks.
Date Wed, 12 Oct 2016 17:25:23 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52492/#review152330
-----------------------------------------------------------



More thorough review this time. Feedback below.

Also, we should add documentation for this monitor, particularly if it is going to be enabled
by default. The documentation should clearly explain what the monitor does and how it's related
to delete retention in a log compacted changelog.


samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 82)
<https://reviews.apache.org/r/52492/#comment221325>

    I see config.getLocalStoreBaseDir called twice in quick succession. It implies that the
paths are related.
    
    Suggestion:
    Since getHostAffinityEnabledJobs() creates a file for the baseDir, why not 
    1. create that file before the for-loop
    2. pass that file into getHostAffinityEnabledJobs
    3. use that file as the first argument (parent) to create the jobDir File.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (lines 85 - 91)
<https://reviews.apache.org/r/52492/#comment221333>

    This code doesn't ensure that it will only delete the store for the appropriate task.
This could be a problem if there were asymmetry in the stores across tasks
    
    So the file system looks like:
    
    .../job1-1/myStoreName1/task1
    .../job1-1/myStoreName2/task2
    
    task.getStoreNames().contains("myStoreName1") will be false for task2. 
    task.getStoreNames().contains("myStoreName2") will be false for task1. 
    
    And so they will delete each other's stores. 
    
    This might be a stretch, but blindly deleting all the task dirs under a particular store
dir seems like a bug waiting to happen.
    
    It would be useful to have some tests to cover these scenarios.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 94)
<https://reviews.apache.org/r/52492/#comment221342>

    3 issues with this section:
    
    1. The big one: The way this code reads, it will delete the store for any non-running
job, completely defeating the purpose of host affinity. After staring at it for a while, I
realized the deleteTaskStore() method doesn't always delete. This is confusing and makes the
code seem destructive. I'd highly recommend a more accurate name, like "markSweepIfStale()"
    2. Do we want to delete the store for a job with status == STARTING? Feels like a race
condition.
    
    see JobStatus.hasBeenStarted()
    3. I didn't see any unit tests covering this logic.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 145)
<https://reviews.apache.org/r/52492/#comment221346>

    Thanks for the careful logging.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 158)
<https://reviews.apache.org/r/52492/#comment221324>

    I'd suggest renaming this to JobsClient for 2 reasons. 
    1. It might be something we refactor out to make it easier for users.
    2. "Util" classes are like misc. buckets. They invite developers to lazily throw methods
in that bucket rather than diligently planning them, naming them, and putting them where they
belong. This almost always turns into a mess over time.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 179)
<https://reviews.apache.org/r/52492/#comment221349>

    This seems wrong. The samza-rest service should not be using the RM port. This request
should have nothing to do with YARN.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 192)
<https://reviews.apache.org/r/52492/#comment221350>

    This seems wrong. The samza-rest service should not be using the RM port. This request
should have nothing to do with YARN.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java (line 206)
<https://reviews.apache.org/r/52492/#comment221351>

    Why don't we just have a config "job status servers" that has a list of host:port to samza-rest
servers that can provide job status? 
    
    This would be analogous to kafka's bootstrap.servers config.
    
    Then we wouldn't need any Yarn dependency.



samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorConfig.java (line 46)
<https://reviews.apache.org/r/52492/#comment221352>

    This must not be larger than delete.retention.ms!
    
    Slightly lower is better.
    
    So if we're defaulting delete.retention.ms to 24 hrs, I'd set this to 23-23.5
    
    Javadocs should also clearly warn of this association.


- Jake Maes


On Oct. 11, 2016, 12:53 a.m., Shanthoosh Venkataraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52492/
> -----------------------------------------------------------
> 
> (Updated Oct. 11, 2016, 12:53 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> This patch contains the samza-rest monitor that periodically cleans up the stale local
stores of dead jobs/tasks. It performs the store deletion in two phases. Initially it deletes
the offset file in the local task stores if the following condition is true. ((jobIsNotRunning
|| preferedHost != nmHost) && offsetFilelastModifiedTime is greater than deleteRetention).
During the subsequent run, it deletes the local task stores if it does not contain offset
file. Please refer to the design doc of SAMZA-656 (https://issues.apache.org/jira/secure/attachment/12828083/DESIGN-SAMZA-656.pdf)
for more details.
> 
> 
> Diffs
> -----
> 
>   build.gradle 2bea27b75288d3103178bc3762b9556f6e69cdd1 
>   samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitor.java PRE-CREATION

>   samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorConfig.java
PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/monitor/YarnLocalStoreMonitorFactory.java
PRE-CREATION 
>   samza-rest/src/main/java/org/apache/samza/rest/resources/JobsResource.java a566db598c284d69ea61af88fdc0851483d5a089

>   samza-rest/src/main/java/org/apache/samza/rest/resources/ResourceConstants.java PRE-CREATION

>   samza-rest/src/test/java/org/apache/samza/monitor/TestYarnLocalStoreMonitor.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/52492/diff/
> 
> 
> Testing
> -------
> 
> Unit testing and manual testing are done to verify the functionality.
> 
> 
> Thanks,
> 
> Shanthoosh Venkataraman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message