metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Psaltis <psaltis.and...@gmail.com>
Subject Re: Metron-265 Model as a Service
Date Thu, 07 Jul 2016 16:47:10 GMT
Thanks Casey! Couple of quick questions.

RE:: "which is why there would be a caching layer set in front of it at the
Storm bolt level"
Hmm, would this be of the results of model execution? Would this really
work when each tuple may contain totally different data? Or is the caching
going to be smart enough that it will look at all the data passed in and
determine that an identical tuple has already been evaluated so serve the
result out of cache?

RE: "Also, we would prefer local instances of the service when and where
possible"
Perfect makes sense.

RE: Serving many models from every storm bolt is also fairly expensive.
I can see how it could be, but couldn't  we can make sure that not all
models live in every bolt?

RE: In this scenario, you can at least scale out via load balancing (i.e.
multiple model services serving the same model) since the models are
immutable.
This seems to address the model serving, not model execution service.
Having yet one more layer to scale and mange also seems like it
would further complicate things. Could we not just also scale the bolts?

Thanks,
Andrew




On Thu, Jul 7, 2016 at 12:37 PM, Casey Stella <cestella@gmail.com> wrote:

> So, regarding the expense of communication; I tend to agree that it is
> expensive, which is why there would be a caching layer set in front of it
> at the Storm bolt level.  Also, we would prefer local instances of the
> service when and where possible.  Serving many models from every storm bolt
> is also fairly expensive.  In this scenario, you can at least scale out via
> load balancing (i.e. multiple model services serving the same model) since
> the models are immutable.
>
> On Thu, Jul 7, 2016 at 9:24 AM, Andrew Psaltis <psaltis.andrew@gmail.com>
> wrote:
>
> > OK that makes sense. So the doc attached to this JIRA[1] just speaks to
> the
> > Model serving. Is there a doc for the model service? And by making this a
> > separate service we are saying that for every  “MODEL_APPLY(model_name,
> > param_1, param_2, …, param_n)” we are potentially going to go across the
> > wire and have a model executed? That seems pretty expensive, no?
> >
> > Thanks,
> > Andrew
> >
> > [1] https://issues.apache.org/jira/browse/METRON-265
> >
> > On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <cestella@gmail.com>
> wrote:
> >
> > > The "REST" model service, which I place in quotes because there is some
> > > strong discussion about whether REST is a reasonable transport for
> this,
> > is
> > > responsible for providing the model.  The scoring/model application
> > happens
> > > in the model service and the results get transferred back to the storm
> > bolt
> > > that calls it.
> > >
> > > Casey
> > >
> > > On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <
> psaltis.andrew@gmail.com
> > >
> > > wrote:
> > >
> > > > Trying to make sure I grok this thread and the word doc attached to
> the
> > > > JIRA. The word doc and JIRA speak to a Model Service Service and that
> > the
> > > > REST service will be responsible for serving up models. However, part
> > of
> > > > this conversation seems to suggest that the model execution will
> > actually
> > > > occur at the REST service .. in particular this comment from James:
> > > >
> > > > "There are several reasons to decouple model execution from Storm:"
> > > >
> > > > If the model execution is decoupled from Storm then it appears that
> the
> > > > REST service will be executing the model, not just serving it up, is
> > that
> > > > correct?
> > > >
> > > > Thanks,
> > > > Andrew
> > > >
> > > >
> > > >
> > > > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >
> > > > > Regarding the performance of REST:
> > > > >
> > > > > Yep, so everyone seems to be worried about the performance
> > implications
> > > > for
> > > > > REST.  I made this comment on the JIRA, but I'll repeat it here for
> > > > broader
> > > > > discussion:
> > > > >
> > > > > My choice of REST was mostly due to the fact that I want to support
> > > > > > multi-language (I think that's a very important requirement)
and
> > > there
> > > > > are
> > > > > > REST libraries for pretty much everything. I do agree, however,
> > that
> > > > JSON
> > > > > > transport can get chunky. How about a compromise and use REST,
> but
> > > the
> > > > > > input and output payloads for scoring are Maps encoded in msgpack
> > > > rather
> > > > > > than JSON. There is a msgpack library for pretty much every
> > language
> > > > out
> > > > > > there (almost) and certainly all of the ones we'd like to target.
> > > > > >
> > > > >
> > > > >
> > > > > > The other option is to just create and expose protobuf bindings
> > > (thrift
> > > > > > doesn't have a native client for R) for all of the languages
that
> > we
> > > > want
> > > > > > to support. I'm perfectly fine with that, but I had some worries
> > > about
> > > > > the
> > > > > > maturity of the bindings.
> > > > > >
> > > > >
> > > > >
> > > > > > The final option, as you suggest, is to just use raw sockets.
I
> > think
> > > > if
> > > > > > we went that route, we might have to create a layer for each
> > language
> > > > > > rather than relying on model creators to create a TCP server.
I
> > > thought
> > > > > > that might be a bit onerous for a MVP.
> > > > > >
> > > > >
> > > > >
> > > > > > Given the discussion, though, what it has made me aware of is
> that
> > we
> > > > > > might not want to dictate a transport mechanism at all, but
> rather
> > > > allow
> > > > > > that to be pluggable and extensible (so each model would be
> > > associated
> > > > > with
> > > > > > a transport mechanism handler that would know how to communicate
> to
> > > it.
> > > > > We
> > > > > > would provide default mechanisms for msgpack over REST, JSON
over
> > > REST
> > > > > and
> > > > > > maybe msgpack over raw TCP.) Thoughts?
> > > > >
> > > > >
> > > > > Regarding PMML:
> > > > >
> > > > > I tend to agree with James that PMML is too restrictive as to
> models
> > it
> > > > can
> > > > > represent and I have not had great experiences with it in
> production.
> > > > > Also, the open source libraries for PMML have licensing issues
> (jpmml
> > > > > requires an older version to accommodate our licensing
> requirements).
> > > > >
> > > > > Regarding workflow:
> > > > >
> > > > > At the moment, I'd like to focus on getting a generalized
> > > infrastructure
> > > > > for model scoring and updating put in place.   This means, this
> > > > > architecture takes up the baton from the point when a model is
> > > > > trained/created.  Also, I have attempted to be generic in terms of
> > > output
> > > > > of the model (a map of results) so it can fit any type of model
> that
> > I
> > > > can
> > > > > think of.  If that's not the case, let me know, though.
> > > > >
> > > > > For instance, for clustering, you would probably emit the cluster
> id
> > > > > associated with the input and that would be added to the message
as
> > it
> > > > > passes through the storm topology.  The model is responsible for
> > > > processing
> > > > > the input and constructing properly formed output.
> > > > >
> > > > > Casey
> > > > >
> > > > >
> > > > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <
> > > dedutta@cisco.com>
> > > > > wrote:
> > > > >
> > > > > > Following up on the thread a little late …. Awesome start
Casey.
> > Some
> > > > > > comments:
> > > > > > * Model execution
> > > > > > ** I am guessing the model execution will be on YARN only for
> now.
> > > This
> > > > > is
> > > > > > fine, but the REST call could have an overhead - depends on
the
> > > speed.
> > > > > > * PMML: won’t we have to choose some DSL for describing models?
> > > > > > * Model:
> > > > > > ** workflow vs a model -  do we care about the “workflow"
that
> > leads
> > > to
> > > > > > the models or just the “model"? For example, we might start
with
> n
> > > > > features
> > > > > > —> do feature selection to choose k (or apply a transform
> function)
> > > —>
> > > > > > apply a model etc
> > > > > > * Use cases - I can see this working for n-ary classification
> style
> > > > > models
> > > > > > easily. Will the same mechanism be used for stuff like clustering
> > (or
> > > > > > intermediate steps like feature selection alone).
> > > > > >
> > > > > > Thx
> > > > > > debo
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 7/5/16, 3:24 PM, "James Sirota" <jsirota@apache.org>
wrote:
> > > > > >
> > > > > > >Simon,
> > > > > > >
> > > > > > >There are several reasons to decouple model execution from
> Storm:
> > > > > > >
> > > > > > >- Reliability: It's much easier to handle a failed service
than
> a
> > > > failed
> > > > > > bolt.  You can also troubleshoot without having to bring down
the
> > > > > topology
> > > > > > >- Complexity: you de-couple the model logic from Storm logic
and
> > can
> > > > > > manage it independently of Storm
> > > > > > >- Portability: you can swap the model guts (switch from
Spark to
> > > > Flink,
> > > > > > etc) and as long as you maintain the interface you are good
to go
> > > > > > >- Consistency: since we want to expose our models the same
way
> we
> > > > expose
> > > > > > threat intel then it makes sense to expose them as a service
> > > > > > >
> > > > > > >In our vision for Metron we want to make it easy to uptake
and
> > share
> > > > > > models.  I think well-defined interfaces and programmatic ways
of
> > > > > > deployment, lifecycle management, and scoring via well-defined
> REST
> > > > > > interfaces will make this task easier.  We can do a few things
to
> > > > > > >
> > > > > > >With respect to PMML I personally had not had much luck
with it
> in
> > > > > > production.  I would prefer models as POJOs.
> > > > > > >
> > > > > > >Thanks,
> > > > > > >James
> > > > > > >
> > > > > > >04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
> > > > > > >> Since the models' parameters and execution algorithm
are
> likely
> > to
> > > > be
> > > > > > small, why not have the model store push the model changes and
> > > scoring
> > > > > > direct to the bolts and execute within storm. This negates the
> > > overhead
> > > > > of
> > > > > > a rest call to the model server, and the need for discovery
of
> the
> > > > model
> > > > > > server in zookeeper.
> > > > > > >>
> > > > > > >> Something like the way ranger policies are updated
/ cached in
> > > > plugins
> > > > > > would seem to make sense, so that we're distributing the model
> > > > execution
> > > > > > directly into the enrichment pipeline rather than collecting
in a
> > > > central
> > > > > > service.
> > > > > > >>
> > > > > > >> This would work with simple models on single events,
but may
> > > > struggle
> > > > > > with correlation based models. However, those could be handled
in
> > > storm
> > > > > by
> > > > > > pushing into a windowing trident topology or something of the
> sort,
> > > or
> > > > > even
> > > > > > with a parallel spark streaming job using the same method of
> > > > distributing
> > > > > > models.
> > > > > > >>
> > > > > > >> The real challenge here would be stateful online models,
which
> > > seem
> > > > > > like a minority case which could be handled by a shared state
> store
> > > > such
> > > > > as
> > > > > > HBase.
> > > > > > >>
> > > > > > >> You still keep the ability to run different languages,
and
> > > > platforms,
> > > > > > but wrap managing the parallelism in storm bolts rather than
yarn
> > > > > > containers.
> > > > > > >>
> > > > > > >> We could also consider basing the model protocol on
a a common
> > > model
> > > > > > language like pmml, thong that is likely to be highly limiting.
> > > > > > >>
> > > > > > >> Simon
> > > > > > >>
> > > > > > >>>  On 4 Jul 2016, at 22:35, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > > > > >>>
> > > > > > >>>  This is great! I'll capture any requirements that
anyone
> wants
> > > to
> > > > > > >>>  contribute and ensure that the proposed architecture
> > > accommodates
> > > > > > them. I
> > > > > > >>>  think we should focus on a minimal set of requirements
and
> an
> > > > > > architecture
> > > > > > >>>  that does not preclude a larger set. I have found
that the
> > best
> > > > > > driver of
> > > > > > >>>  requirements are installed users. :)
> > > > > > >>>
> > > > > > >>>  For instance, I think a lot of questions about
how often to
> > > > update a
> > > > > > model
> > > > > > >>>  and such should be represented in the architecture
by the
> > > ability
> > > > to
> > > > > > >>>  manually update a model, so as long as we have
the ability
> to
> > > > > update,
> > > > > > >>>  people can choose when and where to do it (i.e.
time based
> or
> > > some
> > > > > > other
> > > > > > >>>  trigger). That being said, we don't want to cause
too much
> > > effort
> > > > > for
> > > > > > the
> > > > > > >>>  user if we can avoid it with features.
> > > > > > >>>
> > > > > > >>>  In terms of the questions laid out, here are the
constraints
> > > from
> > > > > the
> > > > > > >>>  proposed architecture as I see them. It'd be great
to get a
> > > sense
> > > > of
> > > > > > >>>  whether these constraints are too onerous or where
they're
> not
> > > > > > opinionated
> > > > > > >>>  enough :
> > > > > > >>>
> > > > > > >>>    - Model versioning and retention
> > > > > > >>>    - We do have the ability to update models, but
the
> training
> > > and
> > > > > > decision
> > > > > > >>>       of when to update the model is left up to
the user. We
> > may
> > > > want
> > > > > > to think
> > > > > > >>>       deeply about when and where automated model
updates can
> > fit
> > > > > > >>>       - Also, retention is currently manual. It
might be an
> > > easier
> > > > > win
> > > > > > to
> > > > > > >>>       set up policies around when to sunset models
(after
> newer
> > > > > > versions are
> > > > > > >>>       added, for instance).
> > > > > > >>>    - Model access controls management
> > > > > > >>>    - The architecture proposes no constraints around
this. As
> > it
> > > > > stands
> > > > > > >>>       now, models are held in HDFS, so it would
inherit the
> > same
> > > > > > security
> > > > > > >>>       capabilities from that (user/group permissions
+
> Ranger,
> > > etc)
> > > > > > >>>    - Requirements around concept drift
> > > > > > >>>    - I'd love to hear user requirements around
how we could
> > > > > > automatically
> > > > > > >>>       address concept drift. The architecture as
it's
> proposed
> > > > let's
> > > > > > the user
> > > > > > >>>       decide when to update models.
> > > > > > >>>    - Requirements around model output
> > > > > > >>>    - The architecture as it stands just mandates
a JSON map
> > input
> > > > and
> > > > > > JSON
> > > > > > >>>       map output, so it's up to the model what
they want to
> > pass
> > > > > back.
> > > > > > >>>       - It's also up to the model to document its
own output.
> > > > > > >>>    - Any model audit and logging requirements
> > > > > > >>>    - The architecture proposes no constraints around
this.
> I'd
> > > love
> > > > > to
> > > > > > see
> > > > > > >>>       community guidance around this. As it stands,
we just
> log
> > > > using
> > > > > > the same
> > > > > > >>>       mechanism as any YARN application.
> > > > > > >>>    - What model metrics need to be exposed
> > > > > > >>>    - The architecture proposes no constraints around
this.
> I'd
> > > love
> > > > > to
> > > > > > see
> > > > > > >>>       community guidance around this.
> > > > > > >>>       - Requirements around failure modes
> > > > > > >>>    - We briefly touch on this in the document,
but it is
> > probably
> > > > not
> > > > > > >>>       complete. Service endpoint failure will result
in
> > > > blacklisting
> > > > > > from a
> > > > > > >>>       storm bolt perspective and node failure should
result
> in
> > a
> > > > new
> > > > > > container
> > > > > > >>>       being started by the Yarn application master.
Beyond
> > that,
> > > > the
> > > > > > >>>       architecture isn't explicit.
> > > > > > >>>
> > > > > > >>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota
<
> > > jsirota@apache.org
> > > > >
> > > > > > wrote:
> > > > > > >>>>
> > > > > > >>>>  I left a comment on the JIRA. I think your
design is
> > promising.
> > > > One
> > > > > > >>>>  other thing I would suggest is for us to crowd
source
> > > > requirements
> > > > > > around
> > > > > > >>>>  model management. Specifically:
> > > > > > >>>>
> > > > > > >>>>  Model versioning and retention
> > > > > > >>>>  Model access controls management
> > > > > > >>>>  Requirements around concept drift
> > > > > > >>>>  Requirements around model output
> > > > > > >>>>  Any model audit and logging requirements
> > > > > > >>>>  What model metrics need to be exposed
> > > > > > >>>>  Requirements around failure modes
> > > > > > >>>>
> > > > > > >>>>  03.07.2016, 14:00, "Casey Stella" <cestella@gmail.com>:
> > > > > > >>>>>  Hi all,
> > > > > > >>>>>
> > > > > > >>>>>  I think we are at the point where we should
try to tackle
> > > Model
> > > > > as a
> > > > > > >>>>>  service for Metron. As such, I created
a JIRA and proposed
> > an
> > > > > > >>>>  architecture
> > > > > > >>>>>  for accomplishing this within Metron.
> > > > > > >>>>>
> > > > > > >>>>>  My inclination is to be data science language/library
> > agnostic
> > > > and
> > > > > > to
> > > > > > >>>>>  provide a general purpose REST infrastructure
for managing
> > and
> > > > > > serving
> > > > > > >>>>>  models trained on historical data captured
from Metron.
> The
> > > > > > assumption is
> > > > > > >>>>>  that we are within the hadoop ecosystem,
so:
> > > > > > >>>>>
> > > > > > >>>>>    - Models stored on HDFS
> > > > > > >>>>>    - REST Model Services resource-managed
via Yarn
> > > > > > >>>>>    - REST Model Services discovered via
Zookeeper.
> > > > > > >>>>>
> > > > > > >>>>>  I would really appreciate community comment
on the JIRA (
> > > > > > >>>>>  https://issues.apache.org/jira/browse/METRON-265).
The
> > > proposed
> > > > > > >>>>>  architecture is attached as a document
to that JIRA.
> > > > > > >>>>>
> > > > > > >>>>>  I look forward to feedback!
> > > > > > >>>>>
> > > > > > >>>>>  Best,
> > > > > > >>>>>
> > > > > > >>>>>  Casey
> > > > > > >>>>
> > > > > > >>>>  -------------------
> > > > > > >>>>  Thank you,
> > > > > > >>>>
> > > > > > >>>>  James Sirota
> > > > > > >>>>  PPMC- Apache Metron (Incubating)
> > > > > > >>>>  jsirota AT apache DOT org
> > > > > > >
> > > > > > >-------------------
> > > > > > >Thank you,
> > > > > > >
> > > > > > >James Sirota
> > > > > > >PPMC- Apache Metron (Incubating)
> > > > > > >jsirota AT apache DOT org
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks,
> > > > Andrew
> > > >
> > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > twiiter: @itmdata <
> http://twitter.com/intent/user?screen_name=itmdata>
> > > >
> > >
> >
> >
> >
> > --
> > Thanks,
> > Andrew
> >
> > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
> >
>



-- 
Thanks,
Andrew

Subscribe to my book: Streaming Data <http://manning.com/psaltis>
<https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message