metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Metron-265 Model as a Service
Date Thu, 07 Jul 2016 19:36:03 GMT
Alright, so let's think the transport layer through a bit.  The players on
the protocol stack are, as I see them:

   - REST
      - Pros
         - Simple to implement and understand for the model creator.  Just
         hook into a 3rd party library to serve up your model as a
REST service and
         go
         - Mature REST implementations for the target non-JVM languages (R
         and Python)
      - Cons
         - Message overhead
      - Possible Mitigations to the Cons
         - Connection pooling and a caching layer in front of this will
         mitigate some of the latency concerns
      - Websockets
      - Pros
         - Less message overhead than with REST
         - Seems to be possible in both target non-JVM languages (R and
         Python)
      - Cons
         - Main R supported library (libhttpuv) appears to be GPL'd
         <https://github.com/rstudio/httpuv/blob/master/LICENSE>, so demo
         models would be impossible using that library
         - Unclear how mature the support is for non-JVM languages
      - Thrift
      - Pros
         - More performant possibly with a tighter binary serialization and
         less communication overhead
      - Cons
         - Non-existent R bindings
      - Custom Server in Java that uses Thrift and marshalls requests to
   the process executing the model and forwards responses
      - Pros
         - We get more control for logging and auditing within the request
         process, so more granular metrics of model performance
         - Much of the actual work is done for us via Finagle and we can
         use Thrift as the protocol
      - Cons
         - More complex with more custom code
         - Model creators would have to be comfortable with handling
         requests coming over some sort of IPC mechanism (like a named
pipes or even
         a tcp connection)

My inclination is still toward the simplicity of REST, especially with a
caching layer in front of it.  That being said, I can see some benefit to a
custom server (using finagle to do most of the heavy lifting) that would
forward requests/responses.  I do worry that that architectural direction
would downright necessitate a 3rd party library that we manage to make
serving models easier, though.  I don't know that I particularly like going
down that route.

If I missed any, let me know.

Casey


On Thu, Jul 7, 2016 at 9:46 AM, Debo Dutta (dedutta) <dedutta@cisco.com>
wrote:

> IMO thrift >> rest. Another option is good old RPC/dRPC :)
>
>
>
>
> On 7/7/16, 9:17 AM, "Casey Stella" <cestella@gmail.com> wrote:
>
> >Yeah, I am slowly getting convinced that REST may be too much overhead and
> >tending closer to using Thrift and communicating to the model handler
> >(possibly in non-java) via some IPC.
> >
> >On Thu, Jul 7, 2016 at 9:15 AM, Simon Ball <sball@hortonworks.com> wrote:
> >
> >> Hi Casey,
> >>
> >> Just to clarify, my thought was web sockets, not raw sockets, language
> >> agnostic, though thrift or proton if would be much better. Even with a
> non
> >> JSON payload, rest is very heavy over http. You be looking at probably
> >> 1-2kb header overhead per packet scored just on transport headers. Web
> >> socket frames carry slightly less overhead per message.
> >>
> >> Simon
> >>
> >>
> >> > On 7 Jul 2016, at 16:51, Casey Stella <cestella@gmail.com> wrote:
> >> >
> >> > Regarding the performance of REST:
> >> >
> >> > Yep, so everyone seems to be worried about the performance
> implications
> >> for
> >> > REST.  I made this comment on the JIRA, but I'll repeat it here for
> >> broader
> >> > discussion:
> >> >
> >> > My choice of REST was mostly due to the fact that I want to support
> >> >> multi-language (I think that's a very important requirement) and
> there
> >> are
> >> >> REST libraries for pretty much everything. I do agree, however, that
> >> JSON
> >> >> transport can get chunky. How about a compromise and use REST, but
> the
> >> >> input and output payloads for scoring are Maps encoded in msgpack
> rather
> >> >> than JSON. There is a msgpack library for pretty much every language
> out
> >> >> there (almost) and certainly all of the ones we'd like to target.
> >> >
> >> >
> >> >> The other option is to just create and expose protobuf bindings
> (thrift
> >> >> doesn't have a native client for R) for all of the languages that we
> >> want
> >> >> to support. I'm perfectly fine with that, but I had some worries
> about
> >> the
> >> >> maturity of the bindings.
> >> >
> >> >
> >> >> The final option, as you suggest, is to just use raw sockets. I
> think if
> >> >> we went that route, we might have to create a layer for each language
> >> >> rather than relying on model creators to create a TCP server. I
> thought
> >> >> that might be a bit onerous for a MVP.
> >> >
> >> >
> >> >> Given the discussion, though, what it has made me aware of is that
we
> >> >> might not want to dictate a transport mechanism at all, but rather
> allow
> >> >> that to be pluggable and extensible (so each model would be
> associated
> >> with
> >> >> a transport mechanism handler that would know how to communicate to
> it.
> >> We
> >> >> would provide default mechanisms for msgpack over REST, JSON over
> REST
> >> and
> >> >> maybe msgpack over raw TCP.) Thoughts?
> >> >
> >> >
> >> > Regarding PMML:
> >> >
> >> > I tend to agree with James that PMML is too restrictive as to models
> it
> >> can
> >> > represent and I have not had great experiences with it in production.
> >> > Also, the open source libraries for PMML have licensing issues (jpmml
> >> > requires an older version to accommodate our licensing requirements).
> >> >
> >> > Regarding workflow:
> >> >
> >> > At the moment, I'd like to focus on getting a generalized
> infrastructure
> >> > for model scoring and updating put in place.   This means, this
> >> > architecture takes up the baton from the point when a model is
> >> > trained/created.  Also, I have attempted to be generic in terms of
> output
> >> > of the model (a map of results) so it can fit any type of model that I
> >> can
> >> > think of.  If that's not the case, let me know, though.
> >> >
> >> > For instance, for clustering, you would probably emit the cluster id
> >> > associated with the input and that would be added to the message as it
> >> > passes through the storm topology.  The model is responsible for
> >> processing
> >> > the input and constructing properly formed output.
> >> >
> >> > Casey
> >> >
> >> >
> >> > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <
> dedutta@cisco.com>
> >> > wrote:
> >> >
> >> >> Following up on the thread a little late …. Awesome start Casey.
Some
> >> >> comments:
> >> >> * Model execution
> >> >> ** I am guessing the model execution will be on YARN only for now.
> This
> >> is
> >> >> fine, but the REST call could have an overhead - depends on the
> speed.
> >> >> * PMML: won’t we have to choose some DSL for describing models?
> >> >> * Model:
> >> >> ** workflow vs a model -  do we care about the “workflow" that leads
> to
> >> >> the models or just the “model"? For example, we might start with
n
> >> features
> >> >> —> do feature selection to choose k (or apply a transform function)
> —>
> >> >> apply a model etc
> >> >> * Use cases - I can see this working for n-ary classification style
> >> models
> >> >> easily. Will the same mechanism be used for stuff like clustering (or
> >> >> intermediate steps like feature selection alone).
> >> >>
> >> >> Thx
> >> >> debo
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>> On 7/5/16, 3:24 PM, "James Sirota" <jsirota@apache.org> wrote:
> >> >>>
> >> >>> Simon,
> >> >>>
> >> >>> There are several reasons to decouple model execution from Storm:
> >> >>>
> >> >>> - Reliability: It's much easier to handle a failed service than
a
> >> failed
> >> >> bolt.  You can also troubleshoot without having to bring down the
> >> topology
> >> >>> - Complexity: you de-couple the model logic from Storm logic and
can
> >> >> manage it independently of Storm
> >> >>> - Portability: you can swap the model guts (switch from Spark to
> Flink,
> >> >> etc) and as long as you maintain the interface you are good to go
> >> >>> - Consistency: since we want to expose our models the same way
we
> >> expose
> >> >> threat intel then it makes sense to expose them as a service
> >> >>>
> >> >>> In our vision for Metron we want to make it easy to uptake and
share
> >> >> models.  I think well-defined interfaces and programmatic ways of
> >> >> deployment, lifecycle management, and scoring via well-defined REST
> >> >> interfaces will make this task easier.  We can do a few things to
> >> >>>
> >> >>> With respect to PMML I personally had not had much luck with it
in
> >> >> production.  I would prefer models as POJOs.
> >> >>>
> >> >>> Thanks,
> >> >>> James
> >> >>>
> >> >>> 04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
> >> >>>> Since the models' parameters and execution algorithm are likely
to
> be
> >> >> small, why not have the model store push the model changes and
> scoring
> >> >> direct to the bolts and execute within storm. This negates the
> overhead
> >> of
> >> >> a rest call to the model server, and the need for discovery of the
> model
> >> >> server in zookeeper.
> >> >>>>
> >> >>>> Something like the way ranger policies are updated / cached
in
> plugins
> >> >> would seem to make sense, so that we're distributing the model
> execution
> >> >> directly into the enrichment pipeline rather than collecting in a
> >> central
> >> >> service.
> >> >>>>
> >> >>>> This would work with simple models on single events, but may
> struggle
> >> >> with correlation based models. However, those could be handled in
> storm
> >> by
> >> >> pushing into a windowing trident topology or something of the sort,
> or
> >> even
> >> >> with a parallel spark streaming job using the same method of
> >> distributing
> >> >> models.
> >> >>>>
> >> >>>> The real challenge here would be stateful online models, which
seem
> >> >> like a minority case which could be handled by a shared state store
> >> such as
> >> >> HBase.
> >> >>>>
> >> >>>> You still keep the ability to run different languages, and
> platforms,
> >> >> but wrap managing the parallelism in storm bolts rather than yarn
> >> >> containers.
> >> >>>>
> >> >>>> We could also consider basing the model protocol on a a common
> model
> >> >> language like pmml, thong that is likely to be highly limiting.
> >> >>>>
> >> >>>> Simon
> >> >>>>
> >> >>>>> On 4 Jul 2016, at 22:35, Casey Stella <cestella@gmail.com>
wrote:
> >> >>>>>
> >> >>>>> This is great! I'll capture any requirements that anyone
wants to
> >> >>>>> contribute and ensure that the proposed architecture accommodates
> >> >> them. I
> >> >>>>> think we should focus on a minimal set of requirements
and an
> >> >> architecture
> >> >>>>> that does not preclude a larger set. I have found that
the best
> >> >> driver of
> >> >>>>> requirements are installed users. :)
> >> >>>>>
> >> >>>>> For instance, I think a lot of questions about how often
to
> update a
> >> >> model
> >> >>>>> and such should be represented in the architecture by the
ability
> to
> >> >>>>> manually update a model, so as long as we have the ability
to
> update,
> >> >>>>> people can choose when and where to do it (i.e. time based
or some
> >> >> other
> >> >>>>> trigger). That being said, we don't want to cause too much
effort
> for
> >> >> the
> >> >>>>> user if we can avoid it with features.
> >> >>>>>
> >> >>>>> In terms of the questions laid out, here are the constraints
from
> the
> >> >>>>> proposed architecture as I see them. It'd be great to get
a sense
> of
> >> >>>>> whether these constraints are too onerous or where they're
not
> >> >> opinionated
> >> >>>>> enough :
> >> >>>>>
> >> >>>>>   - Model versioning and retention
> >> >>>>>   - We do have the ability to update models, but the training
and
> >> >> decision
> >> >>>>>      of when to update the model is left up to the user.
We may
> want
> >> >> to think
> >> >>>>>      deeply about when and where automated model updates
can fit
> >> >>>>>      - Also, retention is currently manual. It might be
an easier
> win
> >> >> to
> >> >>>>>      set up policies around when to sunset models (after
newer
> >> >> versions are
> >> >>>>>      added, for instance).
> >> >>>>>   - Model access controls management
> >> >>>>>   - The architecture proposes no constraints around this.
As it
> >> stands
> >> >>>>>      now, models are held in HDFS, so it would inherit
the same
> >> >> security
> >> >>>>>      capabilities from that (user/group permissions + Ranger,
etc)
> >> >>>>>   - Requirements around concept drift
> >> >>>>>   - I'd love to hear user requirements around how we could
> >> >> automatically
> >> >>>>>      address concept drift. The architecture as it's proposed
> let's
> >> >> the user
> >> >>>>>      decide when to update models.
> >> >>>>>   - Requirements around model output
> >> >>>>>   - The architecture as it stands just mandates a JSON
map input
> and
> >> >> JSON
> >> >>>>>      map output, so it's up to the model what they want
to pass
> back.
> >> >>>>>      - It's also up to the model to document its own output.
> >> >>>>>   - Any model audit and logging requirements
> >> >>>>>   - The architecture proposes no constraints around this.
I'd
> love to
> >> >> see
> >> >>>>>      community guidance around this. As it stands, we just
log
> using
> >> >> the same
> >> >>>>>      mechanism as any YARN application.
> >> >>>>>   - What model metrics need to be exposed
> >> >>>>>   - The architecture proposes no constraints around this.
I'd
> love to
> >> >> see
> >> >>>>>      community guidance around this.
> >> >>>>>      - Requirements around failure modes
> >> >>>>>   - We briefly touch on this in the document, but it is
probably
> not
> >> >>>>>      complete. Service endpoint failure will result in
> blacklisting
> >> >> from a
> >> >>>>>      storm bolt perspective and node failure should result
in a
> new
> >> >> container
> >> >>>>>      being started by the Yarn application master. Beyond
that,
> the
> >> >>>>>      architecture isn't explicit.
> >> >>>>>
> >> >>>>>> On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <jsirota@apache.org
> >
> >> >> wrote:
> >> >>>>>>
> >> >>>>>> I left a comment on the JIRA. I think your design is
promising.
> One
> >> >>>>>> other thing I would suggest is for us to crowd source
> requirements
> >> >> around
> >> >>>>>> model management. Specifically:
> >> >>>>>>
> >> >>>>>> Model versioning and retention
> >> >>>>>> Model access controls management
> >> >>>>>> Requirements around concept drift
> >> >>>>>> Requirements around model output
> >> >>>>>> Any model audit and logging requirements
> >> >>>>>> What model metrics need to be exposed
> >> >>>>>> Requirements around failure modes
> >> >>>>>>
> >> >>>>>> 03.07.2016, 14:00, "Casey Stella" <cestella@gmail.com>:
> >> >>>>>>> Hi all,
> >> >>>>>>>
> >> >>>>>>> I think we are at the point where we should try
to tackle Model
> as
> >> a
> >> >>>>>>> service for Metron. As such, I created a JIRA and
proposed an
> >> >>>>>> architecture
> >> >>>>>>> for accomplishing this within Metron.
> >> >>>>>>>
> >> >>>>>>> My inclination is to be data science language/library
agnostic
> and
> >> >> to
> >> >>>>>>> provide a general purpose REST infrastructure for
managing and
> >> >> serving
> >> >>>>>>> models trained on historical data captured from
Metron. The
> >> >> assumption is
> >> >>>>>>> that we are within the hadoop ecosystem, so:
> >> >>>>>>>
> >> >>>>>>>   - Models stored on HDFS
> >> >>>>>>>   - REST Model Services resource-managed via Yarn
> >> >>>>>>>   - REST Model Services discovered via Zookeeper.
> >> >>>>>>>
> >> >>>>>>> I would really appreciate community comment on
the JIRA (
> >> >>>>>>> https://issues.apache.org/jira/browse/METRON-265).
The proposed
> >> >>>>>>> architecture is attached as a document to that
JIRA.
> >> >>>>>>>
> >> >>>>>>> I look forward to feedback!
> >> >>>>>>>
> >> >>>>>>> Best,
> >> >>>>>>>
> >> >>>>>>> Casey
> >> >>>>>>
> >> >>>>>> -------------------
> >> >>>>>> Thank you,
> >> >>>>>>
> >> >>>>>> James Sirota
> >> >>>>>> PPMC- Apache Metron (Incubating)
> >> >>>>>> jsirota AT apache DOT org
> >> >>>
> >> >>> -------------------
> >> >>> Thank you,
> >> >>>
> >> >>> James Sirota
> >> >>> PPMC- Apache Metron (Incubating)
> >> >>> jsirota AT apache DOT org
> >> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message