samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: Review Request 44920: Remove tight coupling of Samza with Yarn. Define APIs for resource manager integration
Date Thu, 24 Mar 2016 21:05:27 GMT


> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > Some more comments
> 
> Chris Pettitt wrote:
>     Sorry, did not mean to create issues for all of the below. I think #1 and #2 are
the most interesting to look at of the group.

I did not want to loose comments, So, I'm publishing them. I'll update the RB with the revised
draft.


> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java, line
83
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301270#file1301270line83>
> >
> >     This should be final, especially as we have other code that is relying on it
to be a) correct and b) visible (from a thread-safety perspective) for proper shutdown.

Agreed! I've made containerCount an atomic integer. I believe that currently solves the visibility,
atomicity problem for now for that field. 

Long term, I believe SamzaAppState should be thread-safe (using copy-on-write), with private
final variables, and accessors. However, making this thread-safe/private is a change that
is more involved. (It will be a separate effort altogether - SAMZA-902 tracks this)


> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java,
line 147
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301275#file1301275line147>
> >
> >     Should we wait forever or fail after some reasonable timout?

The current implementation waits for ever. I did not feel compelled to change it in this refactoring.
I think it's okay to wait - because, allocator.stop() is called just before. That sets the
flag (volatile) to false. So, the next time the allocator thread sees this flag, it will shutdown.


> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala, line
59
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301277#file1301277line59>
> >
> >     It's not obvious - I'm probably missing it - but it seems that we only write
to this field in getJobCoordinator. If that's the case then we could just move this into that
function instead of storing it here as a volatile.

I've gotten rid of it. It does not seem to be used elsewhere.


> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote:
> > samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java,
line 86
> > <https://reviews.apache.org/r/44920/diff/1/?file=1301281#file1301281line86>
> >
> >     Minor: may as well use a diamond here since you're using it on the next line.

Good point. Fixed.


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44920/#review123928
-----------------------------------------------------------


On March 16, 2016, 6:23 p.m., Jagadish Venkatraman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/44920/
> -----------------------------------------------------------
> 
> (Updated March 16, 2016, 6:23 p.m.)
> 
> 
> Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Maes, Yi Pan (Data Infrastructure),
Navina Ramesh, and Xinyu Liu.
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> Samza currently has tight coupling with Yarn. This makes it impossible to integrate with
other resource managers like Mesos, or to run standalone without any resource manager. This
RB is a step to implementing SAMZA-881.
> 
> Design Doc: https://issues.apache.org/jira/secure/attachment/12790540/SamzaJobCoordinatorRe-designProposal.pdf
> 
> 1.Proposed new APIs for a resource manager to integrate with Samza. (SAMZA-881)
>    - Defined the ContainerProcessManager abstraction, SamzaResource, SamzaResourceRequest.

>    - Re-wrote the SamzaAppMaster into a ClusterBasedJobCoordinator.
>    - Re-wrote yarn specific request logic by abstracting it into a YarnContainerManager.

> 2.Defined a ClusterManagerConfig to handle configurations independent of Yarn/Mesos.
> 3.Made Samza's cluster interaction independent of Yarn. This separates Samza specific
components into samza-core and Yarn components into samza-yarn.
> 4.Readability improvements to the existing code base.
>    -Added docs for most methods, member variables and classes (including on thread-safety)
>    - Made internal variables final to document intent, visibility across threads. (trivially
by adding modifiers, or by changing where they're initialized.)
> 5.Refactored JobCoordinator into a JobModelReader.
> 
> TODO: Can go into the upcoming release. (P0)
> 1.Refactor the UI state variables and tests. Port some method re-orgs from SAMZA-867
into here.
> 2.Revise packaging structure.
> 4.Document new configs.
> 5.Rename run-am.sh to run-coordinator.sh, Delete all files in the non-refactored namespace.
(For unit-testing, these files continue to exist)
> 
> TODO: (P1)
> 1.Build Mesos integration for Samza. Should be simpler to integrate with the newer APIs.
>   - I started on this, and I plan to refine and post an RB in one of the hack-days.
> 2.Refactor the SamzaAppState class to provide more accessors and eliminate public variables.
(This was
> a consequence of the already existing design which I've tried to be compatible with)
> 
> TODO: I plan to track these with JIRAs so that they can be done later. (P2)
> 1.Get rid of the HTTP Server in the JobCoordinator
> 2.Make YarnJobCoordinator implement the JobCoordinator API as SAMZA-881.
> 
> 
> Diffs
> -----
> 
>   samza-core/src/main/java/org/apache/samza/clustermanager/AbstractContainerAllocator.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/ContainerAllocator.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/clustermanager/ContainerManagerFactory.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/ContainerProcessManager.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/ContainerRequestState.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/HostAwareContainerAllocator.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/ResourceFailure.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaContainerLaunchException.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResource.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceRequest.java
PRE-CREATION 
>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaResourceStatus.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/clustermanager/SamzaTaskManager.java PRE-CREATION

>   samza-core/src/main/java/org/apache/samza/config/ClusterManagerConfig.java PRE-CREATION

>   samza-core/src/main/scala/org/apache/samza/coordinator/JobModelReader.scala PRE-CREATION

>   samza-core/src/main/scala/org/apache/samza/metrics/SamzaAppMasterMetrics.scala PRE-CREATION

>   samza-shell/src/main/bash/run-am.sh 9545a96953baaff17ad14962e02bc12aadbb1101 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnAppState.java PRE-CREATION

>   samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManager.java
PRE-CREATION 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerManagerFactory.java
PRE-CREATION 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/refactor/YarnContainerRunner.java
PRE-CREATION 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/YarnAppMasterListener.scala 6bf3046a1ae4ed8f57500acae763184084ad0e09

>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaAppMasterService.scala
PRE-CREATION 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/refactor/SamzaYarnAppMasterLifecycle.scala
PRE-CREATION 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterLifecycle.scala
30cf34fe1fd3f74537d16e8a51b467cd50835357 
>   samza-yarn/src/test/scala/org/apache/samza/job/yarn/TestSamzaAppMasterService.scala
7f5d9f4af088589d4287c26737bae25567c157d7 
> 
> Diff: https://reviews.apache.org/r/44920/diff/
> 
> 
> Testing
> -------
> 
> 1.Tested with running test jobs in Yarn clusters of varying sizes from 1 node to 36 nodes.
> 2.Tested for failures by killing containers, and ensuring they were brought up again.
> 
> TODO:
> 1.Refactor all unit tests, and ensure updates to state variables are consistent with
the current design.
> 2.More testing with test jobs on clusters.
> 
> 
> Thanks,
> 
> Jagadish Venkatraman
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message