samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ethan Setnik" <>
Subject Re: Yarn High Availability - Samza Compatibility Plans
Date Fri, 12 Sep 2014 16:38:12 GMT
My initial tests suggest that the HA failover is handled gracefully in yarn 2.4.0 without any
changes to Samza due to the abstraction of AMRMClient as mentioned by Zhijie.  I’ve also
confirmed that upgrading to yarn 2.5.0 works as well FWIW.

The one thing that’s not working completely is org.apache.samza.job.JobRunner

$ ./gradlew samza-shell:runJob -PconfigPath=file:///usr/samza/config/


2014-09-12 16:13:08 JobRunner [INFO] job factory: org.apache.samza.job.yarn.YarnJobFactory

2014-09-12 16:13:08 ClientHelper [INFO] trying to connect to RM

2014-09-12 16:13:08 RMProxy [INFO] Connecting to ResourceManager at /


The JobRunner creates a new instance of YarnJobFactory which initializes a default instance
of YarnConfiguration instead of reading an existing configuration from the file system.

class YarnJobFactory extends StreamJobFactory {

  def getJob(config: Config) = {

    // TODO fix this. needed to support http package locations.

    val hConfig = new YarnConfiguration

    hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)

    new YarnJob(config, hConfig)



This default config is then used to create and submit the job.

class YarnJob(config: Config, hadoopConfig: Configuration) extends StreamJob {

  import YarnJob._


  val client = new ClientHelper(hadoopConfig)

It looks as if it’s always using the default configuration. How do I specify the environment
for hadoop to load the yarn-site.xml I have placed in /usr/hadoop/etc/hadoop instead of using
the defaults?

On Thu, Sep 11, 2014 at 6:51 PM, Zhijie Shen <>

> W.R.T HA RM, AM doesn't need to make any change if it is using
> AMRMClient(Async), which will automatically take care of the work required
> when RM failover happens.
> One thing that may interest Samza is the work-preserving AM restarting. If
> Samza AM crash for some reason, when it restarts, it can try to get the old
> containers back, to continue the work. To achieve this, Samza needs some
> logic optimization on the AM side.
> On Thu, Sep 11, 2014 at 2:45 PM, Chris Riccomini <
>> wrote:
>> Hey Ethan,
>> Yes, we plan to support this, but haven't done any testing with it yet.
>> The main question that I have is whether supporting HA RM requires Samza's
>> AM to have code changes, or whether the YARN AM client will transparently
>> handle RM failover. Until we run some tests on this (or are told
>> otherwise), I just don't know (and haven't had time to investigate). Does
>> anyone know the answer to this question?
>> Cheers,
>> Chris
>> On 9/11/14 1:55 PM, "Ethan Setnik" <> wrote:
>> >Yarn 2.4.0 brings support for high availability configurations by
>> >specifying a cluster of resource managers and a state store via zookeeper.
>> >
>> >
>> >"When there are multiple RMs, the configuration (yarn-site.xml) used by
>> >clients and nodes is expected to list all the RMs. Clients,
>> >ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the RMs
>> >in a round-robin fashion until they hit the Active RM. If the Active goes
>> >down, they resume the round-robin polling until they hit the "new"
>> >Active.²
>> >
>> >
>> >Does Samza have any plans to support Yarn HA configurations?
> -- 
> Zhijie Shen
> Hortonworks Inc.
> -- 
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message