samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jagadish Venkatraman <jagadish1...@gmail.com>
Subject Re: Review Request 51633: SAMZA-1013: Add YARN Node label support
Date Fri, 07 Oct 2016 01:57:32 GMT


> On Oct. 5, 2016, 6:18 p.m., Jagadish Venkatraman wrote:
> > Overall, the patch looks great! This is exciting given that Samza can support scheduling
based on tags. For example, jobs with rocksdb can be assigned to nodes with SSDs.
> > 
> > 
> > Can you please add some detail on testing this feature? 
> > What was the label setup of the cluster? (for example: Did we use an exclusive node
label?), How many node labels? How many containers were requested for the job?
> 
> Maxim Logvinenko wrote:
>     We haven't tested it in production, but the main idea is the next: we have 3 different
types of nodes in our hadoop cluster. The first type is used for ApplicationMasters (actually,
we put up to 4 AM containers on one node). The second type is used for stateless jobs and
this type of nodes has a small amount of memory. And the last type is used for stateful jobs
and has more memory than others. So, there are 3 labels in our cluster: taskam, tasklowmem,
taskhighmem. Now we force YARN to put containers on a particular type of nodes by a small
trick with resources (we chose resources for node in such a way that YARN doesn't have any
other variants except only one type of nodes). But hadoop labels is a more natural way to
request containers to be placed on a specific node's type.

So, in this case, do you not care about *host affinity* at all when the job re-starts? Are
you okay with your container coming back up on a different host (as long as it is a host with
label `taskHighMem`)? We should make it explicit that when host-affinity.enabled=true, then
node labelling will be ignored. Is my understanding reasonable?


- Jagadish


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/51633/#review151519
-----------------------------------------------------------


On Oct. 7, 2016, 12:08 a.m., Maxim Logvinenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/51633/
> -----------------------------------------------------------
> 
> (Updated Oct. 7, 2016, 12:08 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1013
>     https://issues.apache.org/jira/browse/SAMZA-1013
> 
> 
> Repository: samza
> 
> 
> Description
> -------
> 
> YARN Node labels were introduced in Hadoop version 2.6, which allows to group nodes with
similar characteristics and allows applications to specify where to run. This patch adds support
for YARN node labels in Samza.
> 
> In this implementation, node labels are defined directly in yarnConfig in YarnClusterResourceManager.
It might be better to have node labels as a part of SamzaResourceRequest and SamzaResource
classes, but org.apache.hadoop.yarn.api.records.Container class doesn't contain node label
and hence we have nothing to pass to the SamzaResource constructor in onContainersAllocated
method of YarnClusterResourceManager class.
> 
> 
> Diffs
> -----
> 
>   samza-yarn/src/main/java/org/apache/samza/config/YarnConfig.java 8f2dc48 
>   samza-yarn/src/main/java/org/apache/samza/job/yarn/YarnClusterResourceManager.java
96d3d7c 
>   samza-yarn/src/main/scala/org/apache/samza/job/yarn/ClientHelper.scala 0998c43 
> 
> Diff: https://reviews.apache.org/r/51633/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Maxim Logvinenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message