samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com.INVALID>
Subject Re: Yarn High Availability - Samza Compatibility Plans
Date Fri, 12 Sep 2014 16:40:38 GMT
Hey Ethan,

I believe that you need to export YARN_HOME=<path to yarn home dir>

The YARN_HOME dir must have a conf/yarn-site.xml in it.

Cheers,
Chris

On 9/12/14 9:38 AM, "Ethan Setnik" <ethan.setnik@mobileaware.com> wrote:

>My initial tests suggest that the HA failover is handled gracefully in
>yarn 2.4.0 without any changes to Samza due to the abstraction of
>AMRMClient as mentioned by Zhijie.  I’ve also confirmed that upgrading to
>yarn 2.5.0 works as well FWIW.
>
>
>
>
>The one thing that’s not working completely is
>org.apache.samza.job.JobRunner
>
>
>
>
>
>$ ./gradlew samza-shell:runJob
>-PconfigPath=file:///usr/samza/config/wikipedia-feed.properties
>
>...
>
>
>
>
>2014-09-12 16:13:08 JobRunner [INFO] job factory:
>org.apache.samza.job.yarn.YarnJobFactory
>
>2014-09-12 16:13:08 ClientHelper [INFO] trying to connect to RM
>0.0.0.0:8032
>
>2014-09-12 16:13:08 RMProxy [INFO] Connecting to ResourceManager at
>/0.0.0.0:8032
>
>…
>
>
>
>
>The JobRunner creates a new instance of YarnJobFactory which initializes
>a default instance of YarnConfiguration instead of reading an existing
>configuration from the file system.
>
>
>
>
>
>class YarnJobFactory extends StreamJobFactory {
>
>  def getJob(config: Config) = {
>
>    // TODO fix this. needed to support http package locations.
>
>    val hConfig = new YarnConfiguration
>
>    hConfig.set("fs.http.impl", classOf[HttpFileSystem].getName)
>
>
>
>
>    new YarnJob(config, hConfig)
>
>  }
>
>}
>
>
>
>
>This default config is then used to create and submit the job.
>
>
>
>
>
>class YarnJob(config: Config, hadoopConfig: Configuration) extends
>StreamJob {
>
>  import YarnJob._
>
>  
>
>  val client = new ClientHelper(hadoopConfig)
>
>
>
>
>
>
>
>
>It looks as if it’s always using the default configuration. How do I
>specify the environment for hadoop to load the yarn-site.xml I have
>placed in /usr/hadoop/etc/hadoop instead of using the defaults?
>
>On Thu, Sep 11, 2014 at 6:51 PM, Zhijie Shen <zshen@hortonworks.com>
>wrote:
>
>> W.R.T HA RM, AM doesn't need to make any change if it is using
>> AMRMClient(Async), which will automatically take care of the work
>>required
>> when RM failover happens.
>> One thing that may interest Samza is the work-preserving AM restarting.
>>If
>> Samza AM crash for some reason, when it restarts, it can try to get the
>>old
>> containers back, to continue the work. To achieve this, Samza needs some
>> logic optimization on the AM side.
>> On Thu, Sep 11, 2014 at 2:45 PM, Chris Riccomini <
>> criccomini@linkedin.com.invalid> wrote:
>>> Hey Ethan,
>>>
>>> Yes, we plan to support this, but haven't done any testing with it yet.
>>>
>>> The main question that I have is whether supporting HA RM requires
>>>Samza's
>>> AM to have code changes, or whether the YARN AM client will
>>>transparently
>>> handle RM failover. Until we run some tests on this (or are told
>>> otherwise), I just don't know (and haven't had time to investigate).
>>>Does
>>> anyone know the answer to this question?
>>>
>>> Cheers,
>>> Chris
>>>
>>> On 9/11/14 1:55 PM, "Ethan Setnik" <esetnik@gmail.com> wrote:
>>>
>>> >Yarn 2.4.0 brings support for high availability configurations by
>>> >specifying a cluster of resource managers and a state store via
>>>zookeeper.
>>> >
>>> >
>>> >"When there are multiple RMs, the configuration (yarn-site.xml) used
>>>by
>>> >clients and nodes is expected to list all the RMs. Clients,
>>> >ApplicationMasters (AMs) and NodeManagers (NMs) try connecting to the
>>>RMs
>>> >in a round-robin fashion until they hit the Active RM. If the Active
>>>goes
>>> >down, they resume the round-robin polling until they hit the "new"
>>> >Active.²
>>> >
>>> >
>>> >Does Samza have any plans to support Yarn HA configurations?
>>>
>>>
>> -- 
>> Zhijie Shen
>> Hortonworks Inc.
>> http://hortonworks.com/
>> -- 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>>entity to 
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the
>>reader 
>> of this message is not the intended recipient, you are hereby notified
>>that 
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender
>>immediately 
>> and delete it from your system. Thank You.

Mime
View raw message