samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ke Wu <ke.wu...@gmail.com>
Subject Re: [DISCUSS] SEP-23: Simplify Job Runner
Date Fri, 06 Dec 2019 21:52:22 GMT
As we are revamping our config loading logic in SEP-23: Simplify Job Runner, we will introduce
backward incompatible changes on the job submission logic, i.e. deprecating the usage of --config-factory
and --config-path, which reads a full config upon job submission, instead, we will only provide
job submission related config through --config and ConfigLoader will load the complete config
on AM instead.

Please take a look and chime in your thoughts.

Best,
Ke

> On Dec 2, 2019, at 4:42 PM, Ke Wu <ke.wu.cs@gmail.com> wrote:
> 
> Hi Xinyu,
> 
> Please see the response in line:
> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
> 
> Yes, only configs related to job submission is needed.
> 
> job.name, job.factory.class & yarn.package.path are the minimum three configs needed
for the job submission, which may be supplied by --config instead. 
> 
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
> 
> If the user does not specify config loader in the config, then it will work in the previous
flow, where runner publishes configs in coordinator stream and job coordinator/application
master will pick it up by reading from Kafka. So this is a backward compatible change.
> 
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
> 
> We can either pass them by --config or introduce delicate command line arguments for
it in CommandLine.scala.
> 
> 
> Let me know if you have any further questions.
> 
> Best,
> Ke
> 
>> On Nov 27, 2019, at 11:02 AM, Xinyu Liu <xinyuliu.us@gmail.com> wrote:
>> 
>> Thanks a lot for putting out the design for simplifying the job submission
>> process. The motivation makes sense to me that most of the planning and
>> config generation should be done after submitting to the cluster, instead
>> of during the submission, which can happen in a local sandbox without the
>> access to the resources needed for planning. It also improves the process
>> from the security stand of the view.
>> 
>> A few questions regarding to the interface changes:
>> 
>>  1. After this change, seems the original config-factory and config-path
>>  are only used to supply parameters for submitting job. Is that the case?
>>  Which configs are still needed in the submission?
>>  2. For backward compatibility, does it still work if the user doesn't
>>  specify the new ConfigLoader in the command line? The
>>  PropertiesConfigLoader class seems requiring the path of the config after
>>  exploding the tgz.
>>  3. If the final plan is to remove the original config factory/path, how
>>  do we pass the parameters needed for Yarn submission, e.g. job name, id,
>>  and tgz path?
>> 
>> Thanks,
>> Xinyu
>> 
>> On Fri, Nov 15, 2019 at 3:00 PM Ke Wu <ke.wu.cs@gmail.com> wrote:
>> 
>>> We created SEP-23: Simplify Job Runner, which simplifies job runner by
>>> moving config retrieval and planning to AM.
>>> 
>>> Please find out the SEP wiki below:
>>> 
>>> https://cwiki.apache.org/confluence/display/SAMZA/SEP-23%3A+Simplify+Job+Runner
>>> 
>>> Please take a look and chime in your thoughts.
>>> 
>>> Thanks,
>>> Ke
>>> 
> 


Mime
View raw message