samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Navina Ramesh <nram...@linkedin.com.INVALID>
Subject Re: [DISCUSS] SEP-2: ApplicationRunner Design
Date Fri, 21 Apr 2017 07:12:13 GMT
Hey Yi,
Thanks for lot for your work on this document. I know it must have been
crazy trying to put-together everything in a single doc :)

Here are my comments. Sorry about the delay :(

1. It will be useful to set some background for the benefit of the
community members who haven't been following design docs in the JIRAs. Can
you briefly explain the definition of StreamApplication and how it
translates to jobs through the stack.

2. "Problem" section doesn't seem to describe any problem that
ApplicationRunner is solving :) Imo, ApplicationRunner basically provides a
unified programming pattern for the user to execute StreamApplications
defined using fluent-api or task-level API. I think the problem and
motivation section can use a little bit of re-wording.

3. In the "Overview of ApplicationRunner" section:
* How the components within ApplicationRunner interact isn't very obvious
from the overview image. For example, ExecutionPlanner translates a
"StreamApplication" into an "ExecutionPlan" which is essentially a
specification of the DAG. (Please correct me, if I am wrong here!). The
ExecutionPlan is used by the JobRunner to launch Samza jobs.
* The roles of ExecutionPlanner and JobRunner are fairly well-defined.
StreamManager seems like a util class that helps class-load systems and
create streams. The ExecutionPlan will be consumed by JobRunner and
JobRunner will use StreamManager to create intermediate streams, prior to
launching jobs. It doesn't sound like a StreamManager is a "component" of
the ApplicationRunner.
* What is the role of the RuntimeEnvironment? That has not been explained.
Maybe explaining that will fill the gap in understanding for the readers. I
see that you have tried to explain the flow of control in the code using
the sequence diagram. Perhaps, if we can articulate the
roles/responsibilities of the RuntimeEnvironment, there will not be a need
for the control flow diagram.

4. How is runtime environment defined by the user? Is it configurable ?
Answering these questions in the doc will be useful

5. In the "Interaction between RuntimeEnvironment and ApplicationRunners"
section:
* Samza container is interacting with the RuntimeEnvironment. Does that
make the RuntimeEnvironment as a shared component between the
LocalApplicationRunner and the SamzaContainer? It doesn't seem to be the
case for RemoteApplicationRunner. So, I am confused as to why it is
different.

6. In general, what does "app.class" config represent?  It seems
straightforward when a "StreamApplication" is defined. Is it applicable
when using low-level task api?

7. Interface defintions:
* Perhaps when you implement this, can you specifically callout if each
method is blocking or not in the javadoc ?

8. Minor nit-picking:
* "ApplicationRunners in Different Execution Environments" -> should it be
RuntimeEnvironments as that is the terminology used in the rest of the
document.
        * In the "How this works in standalone deployment" section:
* "Deploy the application to standalone hosts" and *Run run-local-app.sh on
each node to start/stop the local application* are probably just a single
step - Deploy the application to standalone hosts using run-local-app.sh??


General question:
It seems like, even with extensive changes to the interfaces/programming
model, we are still class loading the components for most parts. In such a
world, we are not close to integrating with frameworks that already have a
lifecycle model and can provide instantiated objects directly. For example,
in the Samza as a library use-case, it makes sense for the user to provide
a JmxServer or a taskFactory or a custom metricReporter for the
StreamProcessor. One of the motivations for this case was that most
applications are already running within a servlet/jetty container model
with its own lifecycle. If ApplicationRunner(s) is the unified interface,
doesn't that prohibit Samza from being integrated with such frameworks?

Thanks!
Navina

On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes <jacob.maes@gmail.com> wrote:

> Thanks for the SEP!
>
> +1 on introducing these new components
> -1 on the current definition of their roles (see Design feedback below)
>
> *Design*
>
>    - If LocalJobRunner and RemoteJobRunner handle the different methods of
>    launching a Job, what additional value do the different types of
>    ApplicationRunner and RuntimeEnvironment provide? It seems like a red
> flag
>    that all 3 would need to change from environment to environment. It
>    indicates that they don't have proper modularity. The
> call-sequence-figures
>    support this; LocalApplicationRunner and RemoteApplicationRunner make
> the
>    same calls and the diagram only varies after jobRunner.start()
>    - As far as I can tell, the only difference between Local and Remote
>    ApplicationRunner is that one is blocking and the other is
> non-blocking. If
>    that's all they're for then either the names should be changed to
> reflect
>    this, or they should be combined into one ApplicationRunner and just
> expose
>    separate methods for run() and runBlocking()
>    - There isn't much detail on why the main() methods for Local/Remote
>    have such different implementations, how they receive the Application
>    (direct vs config), and concretely how the deployment scripts, if any,
>    should interact with them.
>
>
> *Style*
>
>    - nit: None of the 11 uses of the word "actual" in the doc are
> *actually*
>    needed. :-)
>    - nit: Colors of the runtime blocks in the diagrams are unconventional
>    and a little distracting. Reminds me of nai won bao. Now I'm hungry. :-)
>    - Prefer the name "ExecutionEnvironment" over "RuntimeEnvironment". The
>    term "execution environment" is used
>    - The code comparisons for the ApplicationRunners are not apples-apples.
>    The local runner example is an application that USES the local runner.
> The
>    remote runner example is the just the runner code itself. So, it's not
>    readily apparent that we're comparing the main() methods and not the
>    application itself.
>
>
> On Mon, Apr 17, 2017 at 5:02 PM, Yi Pan <nickpan47@gmail.com> wrote:
>
> > Made some updates to clarify the role and functions of RuntimeEnvironment
> > in SEP-2.
> >
> > On Fri, Apr 14, 2017 at 9:30 AM, Yi Pan <nickpan47@gmail.com> wrote:
> >
> > > Hi, everyone,
> > >
> > > In light of new features such as fluent API and standalone that
> introduce
> > > new deployment / application launch models in Samza, I created a new
> > SEP-2
> > > to address the new use cases. SEP-2 link: https://cwiki.apache.
> > > org/confluence/display/SAMZA/SEP-2%3A+ApplicationRunner+Design
> > >
> > > Please take a look and give feedbacks!
> > >
> > > Thanks!
> > >
> > > -Yi
> > >
> >
>



-- 
Navina R.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message