metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Metron-265 Model as a Service
Date Thu, 07 Jul 2016 15:51:21 GMT
Regarding the performance of REST:

Yep, so everyone seems to be worried about the performance implications for
REST.  I made this comment on the JIRA, but I'll repeat it here for broader
discussion:

My choice of REST was mostly due to the fact that I want to support
> multi-language (I think that's a very important requirement) and there are
> REST libraries for pretty much everything. I do agree, however, that JSON
> transport can get chunky. How about a compromise and use REST, but the
> input and output payloads for scoring are Maps encoded in msgpack rather
> than JSON. There is a msgpack library for pretty much every language out
> there (almost) and certainly all of the ones we'd like to target.
>


> The other option is to just create and expose protobuf bindings (thrift
> doesn't have a native client for R) for all of the languages that we want
> to support. I'm perfectly fine with that, but I had some worries about the
> maturity of the bindings.
>


> The final option, as you suggest, is to just use raw sockets. I think if
> we went that route, we might have to create a layer for each language
> rather than relying on model creators to create a TCP server. I thought
> that might be a bit onerous for a MVP.
>


> Given the discussion, though, what it has made me aware of is that we
> might not want to dictate a transport mechanism at all, but rather allow
> that to be pluggable and extensible (so each model would be associated with
> a transport mechanism handler that would know how to communicate to it. We
> would provide default mechanisms for msgpack over REST, JSON over REST and
> maybe msgpack over raw TCP.) Thoughts?


Regarding PMML:

I tend to agree with James that PMML is too restrictive as to models it can
represent and I have not had great experiences with it in production.
Also, the open source libraries for PMML have licensing issues (jpmml
requires an older version to accommodate our licensing requirements).

Regarding workflow:

At the moment, I'd like to focus on getting a generalized infrastructure
for model scoring and updating put in place.   This means, this
architecture takes up the baton from the point when a model is
trained/created.  Also, I have attempted to be generic in terms of output
of the model (a map of results) so it can fit any type of model that I can
think of.  If that's not the case, let me know, though.

For instance, for clustering, you would probably emit the cluster id
associated with the input and that would be added to the message as it
passes through the storm topology.  The model is responsible for processing
the input and constructing properly formed output.

Casey


On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta) <dedutta@cisco.com>
wrote:

> Following up on the thread a little late …. Awesome start Casey. Some
> comments:
> * Model execution
> ** I am guessing the model execution will be on YARN only for now. This is
> fine, but the REST call could have an overhead - depends on the speed.
> * PMML: won’t we have to choose some DSL for describing models?
> * Model:
> ** workflow vs a model -  do we care about the “workflow" that leads to
> the models or just the “model"? For example, we might start with n features
> —> do feature selection to choose k (or apply a transform function) —>
> apply a model etc
> * Use cases - I can see this working for n-ary classification style models
> easily. Will the same mechanism be used for stuff like clustering (or
> intermediate steps like feature selection alone).
>
> Thx
> debo
>
>
>
>
> On 7/5/16, 3:24 PM, "James Sirota" <jsirota@apache.org> wrote:
>
> >Simon,
> >
> >There are several reasons to decouple model execution from Storm:
> >
> >- Reliability: It's much easier to handle a failed service than a failed
> bolt.  You can also troubleshoot without having to bring down the topology
> >- Complexity: you de-couple the model logic from Storm logic and can
> manage it independently of Storm
> >- Portability: you can swap the model guts (switch from Spark to Flink,
> etc) and as long as you maintain the interface you are good to go
> >- Consistency: since we want to expose our models the same way we expose
> threat intel then it makes sense to expose them as a service
> >
> >In our vision for Metron we want to make it easy to uptake and share
> models.  I think well-defined interfaces and programmatic ways of
> deployment, lifecycle management, and scoring via well-defined REST
> interfaces will make this task easier.  We can do a few things to
> >
> >With respect to PMML I personally had not had much luck with it in
> production.  I would prefer models as POJOs.
> >
> >Thanks,
> >James
> >
> >04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
> >> Since the models' parameters and execution algorithm are likely to be
> small, why not have the model store push the model changes and scoring
> direct to the bolts and execute within storm. This negates the overhead of
> a rest call to the model server, and the need for discovery of the model
> server in zookeeper.
> >>
> >> Something like the way ranger policies are updated / cached in plugins
> would seem to make sense, so that we're distributing the model execution
> directly into the enrichment pipeline rather than collecting in a central
> service.
> >>
> >> This would work with simple models on single events, but may struggle
> with correlation based models. However, those could be handled in storm by
> pushing into a windowing trident topology or something of the sort, or even
> with a parallel spark streaming job using the same method of distributing
> models.
> >>
> >> The real challenge here would be stateful online models, which seem
> like a minority case which could be handled by a shared state store such as
> HBase.
> >>
> >> You still keep the ability to run different languages, and platforms,
> but wrap managing the parallelism in storm bolts rather than yarn
> containers.
> >>
> >> We could also consider basing the model protocol on a a common model
> language like pmml, thong that is likely to be highly limiting.
> >>
> >> Simon
> >>
> >>>  On 4 Jul 2016, at 22:35, Casey Stella <cestella@gmail.com> wrote:
> >>>
> >>>  This is great! I'll capture any requirements that anyone wants to
> >>>  contribute and ensure that the proposed architecture accommodates
> them. I
> >>>  think we should focus on a minimal set of requirements and an
> architecture
> >>>  that does not preclude a larger set. I have found that the best
> driver of
> >>>  requirements are installed users. :)
> >>>
> >>>  For instance, I think a lot of questions about how often to update a
> model
> >>>  and such should be represented in the architecture by the ability to
> >>>  manually update a model, so as long as we have the ability to update,
> >>>  people can choose when and where to do it (i.e. time based or some
> other
> >>>  trigger). That being said, we don't want to cause too much effort for
> the
> >>>  user if we can avoid it with features.
> >>>
> >>>  In terms of the questions laid out, here are the constraints from the
> >>>  proposed architecture as I see them. It'd be great to get a sense of
> >>>  whether these constraints are too onerous or where they're not
> opinionated
> >>>  enough :
> >>>
> >>>    - Model versioning and retention
> >>>    - We do have the ability to update models, but the training and
> decision
> >>>       of when to update the model is left up to the user. We may want
> to think
> >>>       deeply about when and where automated model updates can fit
> >>>       - Also, retention is currently manual. It might be an easier win
> to
> >>>       set up policies around when to sunset models (after newer
> versions are
> >>>       added, for instance).
> >>>    - Model access controls management
> >>>    - The architecture proposes no constraints around this. As it stands
> >>>       now, models are held in HDFS, so it would inherit the same
> security
> >>>       capabilities from that (user/group permissions + Ranger, etc)
> >>>    - Requirements around concept drift
> >>>    - I'd love to hear user requirements around how we could
> automatically
> >>>       address concept drift. The architecture as it's proposed let's
> the user
> >>>       decide when to update models.
> >>>    - Requirements around model output
> >>>    - The architecture as it stands just mandates a JSON map input and
> JSON
> >>>       map output, so it's up to the model what they want to pass back.
> >>>       - It's also up to the model to document its own output.
> >>>    - Any model audit and logging requirements
> >>>    - The architecture proposes no constraints around this. I'd love to
> see
> >>>       community guidance around this. As it stands, we just log using
> the same
> >>>       mechanism as any YARN application.
> >>>    - What model metrics need to be exposed
> >>>    - The architecture proposes no constraints around this. I'd love to
> see
> >>>       community guidance around this.
> >>>       - Requirements around failure modes
> >>>    - We briefly touch on this in the document, but it is probably not
> >>>       complete. Service endpoint failure will result in blacklisting
> from a
> >>>       storm bolt perspective and node failure should result in a new
> container
> >>>       being started by the Yarn application master. Beyond that, the
> >>>       architecture isn't explicit.
> >>>
> >>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <jsirota@apache.org>
> wrote:
> >>>>
> >>>>  I left a comment on the JIRA. I think your design is promising. One
> >>>>  other thing I would suggest is for us to crowd source requirements
> around
> >>>>  model management. Specifically:
> >>>>
> >>>>  Model versioning and retention
> >>>>  Model access controls management
> >>>>  Requirements around concept drift
> >>>>  Requirements around model output
> >>>>  Any model audit and logging requirements
> >>>>  What model metrics need to be exposed
> >>>>  Requirements around failure modes
> >>>>
> >>>>  03.07.2016, 14:00, "Casey Stella" <cestella@gmail.com>:
> >>>>>  Hi all,
> >>>>>
> >>>>>  I think we are at the point where we should try to tackle Model
as a
> >>>>>  service for Metron. As such, I created a JIRA and proposed an
> >>>>  architecture
> >>>>>  for accomplishing this within Metron.
> >>>>>
> >>>>>  My inclination is to be data science language/library agnostic
and
> to
> >>>>>  provide a general purpose REST infrastructure for managing and
> serving
> >>>>>  models trained on historical data captured from Metron. The
> assumption is
> >>>>>  that we are within the hadoop ecosystem, so:
> >>>>>
> >>>>>    - Models stored on HDFS
> >>>>>    - REST Model Services resource-managed via Yarn
> >>>>>    - REST Model Services discovered via Zookeeper.
> >>>>>
> >>>>>  I would really appreciate community comment on the JIRA (
> >>>>>  https://issues.apache.org/jira/browse/METRON-265). The proposed
> >>>>>  architecture is attached as a document to that JIRA.
> >>>>>
> >>>>>  I look forward to feedback!
> >>>>>
> >>>>>  Best,
> >>>>>
> >>>>>  Casey
> >>>>
> >>>>  -------------------
> >>>>  Thank you,
> >>>>
> >>>>  James Sirota
> >>>>  PPMC- Apache Metron (Incubating)
> >>>>  jsirota AT apache DOT org
> >
> >-------------------
> >Thank you,
> >
> >James Sirota
> >PPMC- Apache Metron (Incubating)
> >jsirota AT apache DOT org
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message