metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Metron-265 Model as a Service
Date Fri, 08 Jul 2016 15:01:43 GMT
Feature selection is so very model specific, that I don't know of a good
way to generalize that.  Really, I try very hard not to get in the way of
data scientists doing data science and feature selection sits squarely in
that domain.  For the DGA model, it's fairly simple to make the cache
key..the domain.  There should be a strong working set for that model, so
it should be fairly effective.

On Fri, Jul 8, 2016 at 10:05 AM, Andrew Psaltis <psaltis.andrew@gmail.com>
wrote:

> Very interesting. Stellar does make life easier/more transparent for JVM
> developers, but it would seem that there will end-up being a client that
> has to be developed for non-jvm languages otherwise folks will be left to
> do all of the work themselves.
>
> Based on some of the reasons cited above (GPU, etc.) it seems like a
> requirement to have a separate model service. It would be really nice to
> find a way to not let the innards of that API leak into everywhere else. If
> all feature selection happens in the model service, that then raises some
> of the same questions SImon did as far as how can the Storm bolts generate
> a good cache key so that the model service does not need to be called.
> Perhaps, this could be solved with a model service library and not just
> exposing a REST (or other protocol) endpoint, the feature selection and
> cache key generation can be hidden from clients. This still poses the issue
> of having to carry this abstraction across languages.
>
> On Fri, Jul 8, 2016 at 9:33 AM, Casey Stella <cestella@gmail.com> wrote:
>
> > So, it's an interesting discussion about where exactly feature selection
> > happens.  I suspect it will happen in multiple places.  Let's take the
> DGA
> > model as our motivating example.  This guy is likely to not require too
> > much beyond the domain name.  The model code should pull apart the
> features
> > from that domain (entropy, the tld, stripping subdomains, etc.) and
> > probably has some reference data resident within the model (probability
> of
> > bigrams in various languages, for instance) that it will use to build the
> > real input.  As it stands, a lot of the feature selection is likely to be
> > done in the model, but the model should be explicit about what it wants
> as
> > input.  For instance, it could demand only the raw domain or it could
> > demand the subdomains, domain and tld to be separated out.  In either
> case,
> > the caching should work as long as the model is deterministic for a given
> > input.
> >
> > I think it will be interesting to see, however, where the feature
> selection
> > will happen for most models.  This is essentially why I pushed for this
> to
> > be part of stellar, so that some transformation can be done prior to
> model
> > execution.  For instance,  for our DGA model, you could call
> > MODEL_APPLY('dga',  DOMAIN_REMOVE_TLD(domain), DOMAIN_TO_TLD(domain) )
> > which would do the (admittedly not-so) heavy work of separating tlds from
> > the domain inside of stellar as opposed to having to do it in the
> language
> > that is being used to implement your model.
> >
> >
> > On Thu, Jul 7, 2016 at 6:33 PM, Simon Ball <sball@hortonworks.com>
> wrote:
> >
> > > There is an interesting division of concerns here that might impact the
> > > design.
> > >
> > > If we're looking to cache things like dga which operate on a subset of
> > the
> > > enriched Metron data model, then we essentially need to push the
> feature
> > > selection, or at least feature projection elements of the model to the
> > edge
> > > (the bolt) to produce a cache key. This seems to make sense in the
> > context
> > > of the function call to the model proposed, but means that the model
> call
> > > does not apply to a whole Metron data record, but a subset determined
> by
> > > that call on the dsl. This implicitly pushes model related concerns
> > > (feature selection) outside of the canonical scope for defining the
> > models
> > > themselves (the model service), which loses model encapsulation.
> > >
> > > In essence you would be embedding the feature selection (projection) of
> > > the model engine in the storm bolts in order to make caching possible,
> > > which would need some sort of central control, and rationalisation to
> > avoid
> > > cache misses between multiple models with slightly different feature
> > sets.
> > > This could add complexity, or reduce cache utilisation really quickly
> > with
> > > model scale.
> > >
> > > Simon
> > >
> > >
> > > > On 7 Jul 2016, at 18:51, Casey Stella <cestella@gmail.com> wrote:
> > > >
> > > > Great questions Andrew.  Thanks for the interest. :)
> > > >
> > > > RE:: "which is why there would be a caching layer set in front of it
> at
> > > the
> > > > Storm bolt level"
> > > >
> > > > Right now we have a LRU caching layer in front of the HBase
> enrichment
> > > > adapters, so it would work similarly.  You can imagine, the range of
> > > inputs
> > > > is likely not perfectly random, so it's reasonable for the cache to
> > have
> > > a
> > > > non-empty working set.  Take for instance a DGA model; the input
> would
> > > be a
> > > > domain and most organizations will have an uneven distribution of
> > domains
> > > > they access with a heavy skew toward a small number.
> > > >
> > > > RE: In this scenario, you can at least scale out via load balancing
> > (i.e.
> > > > multiple model services serving the same model) since the models are
> > > > immutable.
> > > >
> > > > I am talking about model execution here.  The endpoints are
> distributed
> > > > across the cluster and the storm bolt chooses a service to use (with
> a
> > > bias
> > > > toward using one that is local to that bolt) and the request is made
> to
> > > the
> > > > endpoint, which scores the input and returns the response.
> > > >
> > > > Model service, if that term means what I think it means, is almost
> > > entirely
> > > > done inside of zookeeper.  For clarity, I'm talking about service
> > > discovery
> > > > (bolt discovers which endpoints serve which models) and model
> updates.
> > > We
> > > > are not sending the model around to any bolts or any such thing, just
> > for
> > > > clarity sake.
> > > >
> > > >
> > > >
> > > > On Thu, Jul 7, 2016 at 9:47 AM, Andrew Psaltis <
> > psaltis.andrew@gmail.com
> > > >
> > > > wrote:
> > > >
> > > >> Thanks Casey! Couple of quick questions.
> > > >>
> > > >> RE:: "which is why there would be a caching layer set in front of
it
> > at
> > > the
> > > >> Storm bolt level"
> > > >> Hmm, would this be of the results of model execution? Would this
> > really
> > > >> work when each tuple may contain totally different data? Or is the
> > > caching
> > > >> going to be smart enough that it will look at all the data passed
in
> > and
> > > >> determine that an identical tuple has already been evaluated so
> serve
> > > the
> > > >> result out of cache?
> > > >>
> > > >> RE: "Also, we would prefer local instances of the service when and
> > where
> > > >> possible"
> > > >> Perfect makes sense.
> > > >>
> > > >> RE: Serving many models from every storm bolt is also fairly
> > expensive.
> > > >> I can see how it could be, but couldn't  we can make sure that not
> all
> > > >> models live in every bolt?
> > > >>
> > > >> RE: In this scenario, you can at least scale out via load balancing
> > > (i.e.
> > > >> multiple model services serving the same model) since the models are
> > > >> immutable.
> > > >> This seems to address the model serving, not model execution
> service.
> > > >> Having yet one more layer to scale and mange also seems like it
> > > >> would further complicate things. Could we not just also scale the
> > bolts?
> > > >>
> > > >> Thanks,
> > > >> Andrew
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Thu, Jul 7, 2016 at 12:37 PM, Casey Stella <cestella@gmail.com>
> > > wrote:
> > > >>>
> > > >>> So, regarding the expense of communication; I tend to agree that
it
> > is
> > > >>> expensive, which is why there would be a caching layer set in
front
> > of
> > > it
> > > >>> at the Storm bolt level.  Also, we would prefer local instances
of
> > the
> > > >>> service when and where possible.  Serving many models from every
> > storm
> > > >> bolt
> > > >>> is also fairly expensive.  In this scenario, you can at least
scale
> > out
> > > >> via
> > > >>> load balancing (i.e. multiple model services serving the same
> model)
> > > >> since
> > > >>> the models are immutable.
> > > >>>
> > > >>> On Thu, Jul 7, 2016 at 9:24 AM, Andrew Psaltis <
> > > psaltis.andrew@gmail.com
> > > >>>
> > > >>> wrote:
> > > >>>
> > > >>>> OK that makes sense. So the doc attached to this JIRA[1] just
> speaks
> > > to
> > > >>> the
> > > >>>> Model serving. Is there a doc for the model service? And by
making
> > > >> this a
> > > >>>> separate service we are saying that for every
> > > “MODEL_APPLY(model_name,
> > > >>>> param_1, param_2, …, param_n)” we are potentially going
to go
> across
> > > >> the
> > > >>>> wire and have a model executed? That seems pretty expensive,
no?
> > > >>>>
> > > >>>> Thanks,
> > > >>>> Andrew
> > > >>>>
> > > >>>> [1] https://issues.apache.org/jira/browse/METRON-265
> > > >>>>
> > > >>>>> On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <
> cestella@gmail.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> The "REST" model service, which I place in quotes because
there
> is
> > > >> some
> > > >>>>> strong discussion about whether REST is a reasonable transport
> for
> > > >>> this,
> > > >>>> is
> > > >>>>> responsible for providing the model.  The scoring/model
> application
> > > >>>> happens
> > > >>>>> in the model service and the results get transferred back
to the
> > > >> storm
> > > >>>> bolt
> > > >>>>> that calls it.
> > > >>>>>
> > > >>>>> Casey
> > > >>>>>
> > > >>>>> On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <
> > > >>> psaltis.andrew@gmail.com
> > > >>>>>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Trying to make sure I grok this thread and the word
doc attached
> > to
> > > >>> the
> > > >>>>>> JIRA. The word doc and JIRA speak to a Model Service
Service and
> > > >> that
> > > >>>> the
> > > >>>>>> REST service will be responsible for serving up models.
However,
> > > >> part
> > > >>>> of
> > > >>>>>> this conversation seems to suggest that the model
execution will
> > > >>>> actually
> > > >>>>>> occur at the REST service .. in particular this comment
from
> > James:
> > > >>>>>>
> > > >>>>>> "There are several reasons to decouple model execution
from
> > Storm:"
> > > >>>>>>
> > > >>>>>> If the model execution is decoupled from Storm then
it appears
> > that
> > > >>> the
> > > >>>>>> REST service will be executing the model, not just
serving it
> up,
> > > >> is
> > > >>>> that
> > > >>>>>> correct?
> > > >>>>>>
> > > >>>>>> Thanks,
> > > >>>>>> Andrew
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella
<
> > cestella@gmail.com>
> > > >>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> Regarding the performance of REST:
> > > >>>>>>>
> > > >>>>>>> Yep, so everyone seems to be worried about the
performance
> > > >>>> implications
> > > >>>>>> for
> > > >>>>>>> REST.  I made this comment on the JIRA, but I'll
repeat it here
> > > >> for
> > > >>>>>> broader
> > > >>>>>>> discussion:
> > > >>>>>>>
> > > >>>>>>> My choice of REST was mostly due to the fact that
I want to
> > > >> support
> > > >>>>>>>> multi-language (I think that's a very important
requirement)
> > > >> and
> > > >>>>> there
> > > >>>>>>> are
> > > >>>>>>>> REST libraries for pretty much everything.
I do agree,
> however,
> > > >>>> that
> > > >>>>>> JSON
> > > >>>>>>>> transport can get chunky. How about a compromise
and use REST,
> > > >>> but
> > > >>>>> the
> > > >>>>>>>> input and output payloads for scoring are
Maps encoded in
> > > >> msgpack
> > > >>>>>> rather
> > > >>>>>>>> than JSON. There is a msgpack library for
pretty much every
> > > >>>> language
> > > >>>>>> out
> > > >>>>>>>> there (almost) and certainly all of the ones
we'd like to
> > > >> target.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> The other option is to just create and expose
protobuf
> bindings
> > > >>>>> (thrift
> > > >>>>>>>> doesn't have a native client for R) for all
of the languages
> > > >> that
> > > >>>> we
> > > >>>>>> want
> > > >>>>>>>> to support. I'm perfectly fine with that,
but I had some
> > > >> worries
> > > >>>>> about
> > > >>>>>>> the
> > > >>>>>>>> maturity of the bindings.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> The final option, as you suggest, is to just
use raw sockets.
> I
> > > >>>> think
> > > >>>>>> if
> > > >>>>>>>> we went that route, we might have to create
a layer for each
> > > >>>> language
> > > >>>>>>>> rather than relying on model creators to create
a TCP server.
> I
> > > >>>>> thought
> > > >>>>>>>> that might be a bit onerous for a MVP.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Given the discussion, though, what it has
made me aware of is
> > > >>> that
> > > >>>> we
> > > >>>>>>>> might not want to dictate a transport mechanism
at all, but
> > > >>> rather
> > > >>>>>> allow
> > > >>>>>>>> that to be pluggable and extensible (so each
model would be
> > > >>>>> associated
> > > >>>>>>> with
> > > >>>>>>>> a transport mechanism handler that would know
how to
> > > >> communicate
> > > >>> to
> > > >>>>> it.
> > > >>>>>>> We
> > > >>>>>>>> would provide default mechanisms for msgpack
over REST, JSON
> > > >> over
> > > >>>>> REST
> > > >>>>>>> and
> > > >>>>>>>> maybe msgpack over raw TCP.) Thoughts?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> Regarding PMML:
> > > >>>>>>>
> > > >>>>>>> I tend to agree with James that PMML is too restrictive
as to
> > > >>> models
> > > >>>> it
> > > >>>>>> can
> > > >>>>>>> represent and I have not had great experiences
with it in
> > > >>> production.
> > > >>>>>>> Also, the open source libraries for PMML have
licensing issues
> > > >>> (jpmml
> > > >>>>>>> requires an older version to accommodate our licensing
> > > >>> requirements).
> > > >>>>>>>
> > > >>>>>>> Regarding workflow:
> > > >>>>>>>
> > > >>>>>>> At the moment, I'd like to focus on getting a
generalized
> > > >>>>> infrastructure
> > > >>>>>>> for model scoring and updating put in place. 
 This means, this
> > > >>>>>>> architecture takes up the baton from the point
when a model is
> > > >>>>>>> trained/created.  Also, I have attempted to be
generic in terms
> > > >> of
> > > >>>>> output
> > > >>>>>>> of the model (a map of results) so it can fit
any type of model
> > > >>> that
> > > >>>> I
> > > >>>>>> can
> > > >>>>>>> think of.  If that's not the case, let me know,
though.
> > > >>>>>>>
> > > >>>>>>> For instance, for clustering, you would probably
emit the
> cluster
> > > >>> id
> > > >>>>>>> associated with the input and that would be added
to the
> message
> > > >> as
> > > >>>> it
> > > >>>>>>> passes through the storm topology.  The model
is responsible
> for
> > > >>>>>> processing
> > > >>>>>>> the input and constructing properly formed output.
> > > >>>>>>>
> > > >>>>>>> Casey
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta)
<
> > > >>>>> dedutta@cisco.com>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Following up on the thread a little late ….
Awesome start
> > > >> Casey.
> > > >>>> Some
> > > >>>>>>>> comments:
> > > >>>>>>>> * Model execution
> > > >>>>>>>> ** I am guessing the model execution will
be on YARN only for
> > > >>> now.
> > > >>>>> This
> > > >>>>>>> is
> > > >>>>>>>> fine, but the REST call could have an overhead
- depends on
> the
> > > >>>>> speed.
> > > >>>>>>>> * PMML: won’t we have to choose some DSL
for describing
> models?
> > > >>>>>>>> * Model:
> > > >>>>>>>> ** workflow vs a model -  do we care about
the “workflow" that
> > > >>>> leads
> > > >>>>> to
> > > >>>>>>>> the models or just the “model"? For example,
we might start
> > > >> with
> > > >>> n
> > > >>>>>>> features
> > > >>>>>>>> —> do feature selection to choose k (or
apply a transform
> > > >>> function)
> > > >>>>> —>
> > > >>>>>>>> apply a model etc
> > > >>>>>>>> * Use cases - I can see this working for n-ary
classification
> > > >>> style
> > > >>>>>>> models
> > > >>>>>>>> easily. Will the same mechanism be used for
stuff like
> > > >> clustering
> > > >>>> (or
> > > >>>>>>>> intermediate steps like feature selection
alone).
> > > >>>>>>>>
> > > >>>>>>>> Thx
> > > >>>>>>>> debo
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>> On 7/5/16, 3:24 PM, "James Sirota" <jsirota@apache.org>
> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> Simon,
> > > >>>>>>>>>
> > > >>>>>>>>> There are several reasons to decouple
model execution from
> > > >>> Storm:
> > > >>>>>>>>>
> > > >>>>>>>>> - Reliability: It's much easier to handle
a failed service
> > > >> than
> > > >>> a
> > > >>>>>> failed
> > > >>>>>>>> bolt.  You can also troubleshoot without having
to bring down
> > > >> the
> > > >>>>>>> topology
> > > >>>>>>>>> - Complexity: you de-couple the model
logic from Storm logic
> > > >> and
> > > >>>> can
> > > >>>>>>>> manage it independently of Storm
> > > >>>>>>>>> - Portability: you can swap the model
guts (switch from Spark
> > > >> to
> > > >>>>>> Flink,
> > > >>>>>>>> etc) and as long as you maintain the interface
you are good to
> > > >> go
> > > >>>>>>>>> - Consistency: since we want to expose
our models the same
> way
> > > >>> we
> > > >>>>>> expose
> > > >>>>>>>> threat intel then it makes sense to expose
them as a service
> > > >>>>>>>>>
> > > >>>>>>>>> In our vision for Metron we want to make
it easy to uptake
> and
> > > >>>> share
> > > >>>>>>>> models.  I think well-defined interfaces and
programmatic ways
> > > >> of
> > > >>>>>>>> deployment, lifecycle management, and scoring
via well-defined
> > > >>> REST
> > > >>>>>>>> interfaces will make this task easier.  We
can do a few things
> > > >> to
> > > >>>>>>>>>
> > > >>>>>>>>> With respect to PMML I personally had
not had much luck with
> > > >> it
> > > >>> in
> > > >>>>>>>> production.  I would prefer models as POJOs.
> > > >>>>>>>>>
> > > >>>>>>>>> Thanks,
> > > >>>>>>>>> James
> > > >>>>>>>>>
> > > >>>>>>>>> 04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
> > > >>>>>>>>>> Since the models' parameters and execution
algorithm are
> > > >>> likely
> > > >>>> to
> > > >>>>>> be
> > > >>>>>>>> small, why not have the model store push the
model changes and
> > > >>>>> scoring
> > > >>>>>>>> direct to the bolts and execute within storm.
This negates the
> > > >>>>> overhead
> > > >>>>>>> of
> > > >>>>>>>> a rest call to the model server, and the need
for discovery of
> > > >>> the
> > > >>>>>> model
> > > >>>>>>>> server in zookeeper.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Something like the way ranger policies
are updated / cached
> > > >> in
> > > >>>>>> plugins
> > > >>>>>>>> would seem to make sense, so that we're distributing
the model
> > > >>>>>> execution
> > > >>>>>>>> directly into the enrichment pipeline rather
than collecting
> > > >> in a
> > > >>>>>> central
> > > >>>>>>>> service.
> > > >>>>>>>>>>
> > > >>>>>>>>>> This would work with simple models
on single events, but may
> > > >>>>>> struggle
> > > >>>>>>>> with correlation based models. However, those
could be handled
> > > >> in
> > > >>>>> storm
> > > >>>>>>> by
> > > >>>>>>>> pushing into a windowing trident topology
or something of the
> > > >>> sort,
> > > >>>>> or
> > > >>>>>>> even
> > > >>>>>>>> with a parallel spark streaming job using
the same method of
> > > >>>>>> distributing
> > > >>>>>>>> models.
> > > >>>>>>>>>>
> > > >>>>>>>>>> The real challenge here would be stateful
online models,
> > > >> which
> > > >>>>> seem
> > > >>>>>>>> like a minority case which could be handled
by a shared state
> > > >>> store
> > > >>>>>> such
> > > >>>>>>> as
> > > >>>>>>>> HBase.
> > > >>>>>>>>>>
> > > >>>>>>>>>> You still keep the ability to run
different languages, and
> > > >>>>>> platforms,
> > > >>>>>>>> but wrap managing the parallelism in storm
bolts rather than
> > > >> yarn
> > > >>>>>>>> containers.
> > > >>>>>>>>>>
> > > >>>>>>>>>> We could also consider basing the
model protocol on a a
> > > >> common
> > > >>>>> model
> > > >>>>>>>> language like pmml, thong that is likely to
be highly
> limiting.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Simon
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On 4 Jul 2016, at 22:35, Casey
Stella <cestella@gmail.com
> > > >>>
> > > >>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> This is great! I'll capture any
requirements that anyone
> > > >>> wants
> > > >>>>> to
> > > >>>>>>>>>>> contribute and ensure that the
proposed architecture
> > > >>>>> accommodates
> > > >>>>>>>> them. I
> > > >>>>>>>>>>> think we should focus on a minimal
set of requirements and
> > > >>> an
> > > >>>>>>>> architecture
> > > >>>>>>>>>>> that does not preclude a larger
set. I have found that the
> > > >>>> best
> > > >>>>>>>> driver of
> > > >>>>>>>>>>> requirements are installed users.
:)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> For instance, I think a lot of
questions about how often
> > > >> to
> > > >>>>>> update a
> > > >>>>>>>> model
> > > >>>>>>>>>>> and such should be represented
in the architecture by the
> > > >>>>> ability
> > > >>>>>> to
> > > >>>>>>>>>>> manually update a model, so as
long as we have the ability
> > > >>> to
> > > >>>>>>> update,
> > > >>>>>>>>>>> people can choose when and where
to do it (i.e. time based
> > > >>> or
> > > >>>>> some
> > > >>>>>>>> other
> > > >>>>>>>>>>> trigger). That being said, we
don't want to cause too much
> > > >>>>> effort
> > > >>>>>>> for
> > > >>>>>>>> the
> > > >>>>>>>>>>> user if we can avoid it with features.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> In terms of the questions laid
out, here are the
> > > >> constraints
> > > >>>>> from
> > > >>>>>>> the
> > > >>>>>>>>>>> proposed architecture as I see
them. It'd be great to get
> > > >> a
> > > >>>>> sense
> > > >>>>>> of
> > > >>>>>>>>>>> whether these constraints are
too onerous or where they're
> > > >>> not
> > > >>>>>>>> opinionated
> > > >>>>>>>>>>> enough :
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>  - Model versioning and retention
> > > >>>>>>>>>>>  - We do have the ability to update
models, but the
> > > >>> training
> > > >>>>> and
> > > >>>>>>>> decision
> > > >>>>>>>>>>>     of when to update the model
is left up to the user.
> > > >> We
> > > >>>> may
> > > >>>>>> want
> > > >>>>>>>> to think
> > > >>>>>>>>>>>     deeply about when and where
automated model updates
> > > >> can
> > > >>>> fit
> > > >>>>>>>>>>>     - Also, retention is currently
manual. It might be an
> > > >>>>> easier
> > > >>>>>>> win
> > > >>>>>>>> to
> > > >>>>>>>>>>>     set up policies around when
to sunset models (after
> > > >>> newer
> > > >>>>>>>> versions are
> > > >>>>>>>>>>>     added, for instance).
> > > >>>>>>>>>>>  - Model access controls management
> > > >>>>>>>>>>>  - The architecture proposes no
constraints around this.
> > > >> As
> > > >>>> it
> > > >>>>>>> stands
> > > >>>>>>>>>>>     now, models are held in HDFS,
so it would inherit the
> > > >>>> same
> > > >>>>>>>> security
> > > >>>>>>>>>>>     capabilities from that (user/group
permissions +
> > > >>> Ranger,
> > > >>>>> etc)
> > > >>>>>>>>>>>  - Requirements around concept
drift
> > > >>>>>>>>>>>  - I'd love to hear user requirements
around how we could
> > > >>>>>>>> automatically
> > > >>>>>>>>>>>     address concept drift. The
architecture as it's
> > > >>> proposed
> > > >>>>>> let's
> > > >>>>>>>> the user
> > > >>>>>>>>>>>     decide when to update models.
> > > >>>>>>>>>>>  - Requirements around model output
> > > >>>>>>>>>>>  - The architecture as it stands
just mandates a JSON map
> > > >>>> input
> > > >>>>>> and
> > > >>>>>>>> JSON
> > > >>>>>>>>>>>     map output, so it's up to
the model what they want to
> > > >>>> pass
> > > >>>>>>> back.
> > > >>>>>>>>>>>     - It's also up to the model
to document its own
> > > >> output.
> > > >>>>>>>>>>>  - Any model audit and logging
requirements
> > > >>>>>>>>>>>  - The architecture proposes no
constraints around this.
> > > >>> I'd
> > > >>>>> love
> > > >>>>>>> to
> > > >>>>>>>> see
> > > >>>>>>>>>>>     community guidance around
this. As it stands, we just
> > > >>> log
> > > >>>>>> using
> > > >>>>>>>> the same
> > > >>>>>>>>>>>     mechanism as any YARN application.
> > > >>>>>>>>>>>  - What model metrics need to
be exposed
> > > >>>>>>>>>>>  - The architecture proposes no
constraints around this.
> > > >>> I'd
> > > >>>>> love
> > > >>>>>>> to
> > > >>>>>>>> see
> > > >>>>>>>>>>>     community guidance around
this.
> > > >>>>>>>>>>>     - Requirements around failure
modes
> > > >>>>>>>>>>>  - We briefly touch on this in
the document, but it is
> > > >>>> probably
> > > >>>>>> not
> > > >>>>>>>>>>>     complete. Service endpoint
failure will result in
> > > >>>>>> blacklisting
> > > >>>>>>>> from a
> > > >>>>>>>>>>>     storm bolt perspective and
node failure should result
> > > >>> in
> > > >>>> a
> > > >>>>>> new
> > > >>>>>>>> container
> > > >>>>>>>>>>>     being started by the Yarn
application master. Beyond
> > > >>>> that,
> > > >>>>>> the
> > > >>>>>>>>>>>     architecture isn't explicit.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> On Mon, Jul 4, 2016 at 1:49
PM, James Sirota <
> > > >>>>> jsirota@apache.org
> > > >>>>>>>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I left a comment on the JIRA.
I think your design is
> > > >>>> promising.
> > > >>>>>> One
> > > >>>>>>>>>>>> other thing I would suggest
is for us to crowd source
> > > >>>>>> requirements
> > > >>>>>>>> around
> > > >>>>>>>>>>>> model management. Specifically:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Model versioning and retention
> > > >>>>>>>>>>>> Model access controls management
> > > >>>>>>>>>>>> Requirements around concept
drift
> > > >>>>>>>>>>>> Requirements around model
output
> > > >>>>>>>>>>>> Any model audit and logging
requirements
> > > >>>>>>>>>>>> What model metrics need to
be exposed
> > > >>>>>>>>>>>> Requirements around failure
modes
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 03.07.2016, 14:00, "Casey
Stella" <cestella@gmail.com>:
> > > >>>>>>>>>>>>> Hi all,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I think we are at the
point where we should try to
> > > >> tackle
> > > >>>>> Model
> > > >>>>>>> as a
> > > >>>>>>>>>>>>> service for Metron. As
such, I created a JIRA and
> > > >> proposed
> > > >>>> an
> > > >>>>>>>>>>>> architecture
> > > >>>>>>>>>>>>> for accomplishing this
within Metron.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> My inclination is to be
data science language/library
> > > >>>> agnostic
> > > >>>>>> and
> > > >>>>>>>> to
> > > >>>>>>>>>>>>> provide a general purpose
REST infrastructure for
> > > >> managing
> > > >>>> and
> > > >>>>>>>> serving
> > > >>>>>>>>>>>>> models trained on historical
data captured from Metron.
> > > >>> The
> > > >>>>>>>> assumption is
> > > >>>>>>>>>>>>> that we are within the
hadoop ecosystem, so:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - Models stored on HDFS
> > > >>>>>>>>>>>>>  - REST Model Services
resource-managed via Yarn
> > > >>>>>>>>>>>>>  - REST Model Services
discovered via Zookeeper.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I would really appreciate
community comment on the JIRA
> > > >> (
> > > >>>>>>>>>>>>> https://issues.apache.org/jira/browse/METRON-265).
The
> > > >>>>> proposed
> > > >>>>>>>>>>>>> architecture is attached
as a document to that JIRA.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I look forward to feedback!
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Best,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Casey
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> -------------------
> > > >>>>>>>>>>>> Thank you,
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> James Sirota
> > > >>>>>>>>>>>> PPMC- Apache Metron (Incubating)
> > > >>>>>>>>>>>> jsirota AT apache DOT org
> > > >>>>>>>>>
> > > >>>>>>>>> -------------------
> > > >>>>>>>>> Thank you,
> > > >>>>>>>>>
> > > >>>>>>>>> James Sirota
> > > >>>>>>>>> PPMC- Apache Metron (Incubating)
> > > >>>>>>>>> jsirota AT apache DOT org
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Thanks,
> > > >>>>>> Andrew
> > > >>>>>>
> > > >>>>>> Subscribe to my book: Streaming Data <
> http://manning.com/psaltis>
> > > >>>>>> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > >>>>>> twiiter: @itmdata <
> > > >>> http://twitter.com/intent/user?screen_name=itmdata>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> Thanks,
> > > >>>> Andrew
> > > >>>>
> > > >>>> Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > >>>> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > >>>> twiiter: @itmdata <
> > http://twitter.com/intent/user?screen_name=itmdata
> > > >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Thanks,
> > > >> Andrew
> > > >>
> > > >> Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > >> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > >> twiiter: @itmdata <
> http://twitter.com/intent/user?screen_name=itmdata
> > >
> > > >>
> > >
> >
>
>
>
> --
> Thanks,
> Andrew
>
> Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message