metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey Stella <ceste...@gmail.com>
Subject Re: Metron-265 Model as a Service
Date Thu, 07 Jul 2016 19:00:55 GMT
>
> Considering both the storm bolts and the model service will be deployed
> on Yarn, could the bolts not
> use the Yarn registry to identify which model service to connect to before
> making a request?


The bolts are definitely going to figure out which endpoints are serving
which models, but that info will come from zookeeper and get pushed to the
bolts on change, rather than have a separate request to the yarn registry.

How do you scale the model service endpoints if they have a preference for
> which model they serve?


I'd say preference is a loose term.  We'll probably just use a weighted die
and bias the choice toward local endpoints over a remote endpoints.  Let's
all keep in mind here that there are real reasons why you might not have a
model executed from the same node as a storm worker.  Take for instance a
tensorflow model that *needs* GPUs, you might never run a storm worker on
those nodes.  In that situation, the network hop will probably be dominated
by the computation done in scoring and it's probably not cost effective to
scale storm along with GPU nodes.


> And each is a simple REST (or another more performant protocol) service
> as the document describes?


Yep.


On Thu, Jul 7, 2016 at 11:14 AM, Andrew Psaltis <psaltis.andrew@gmail.com>
wrote:

> Thanks Casey, that helps.
>
> RE: I am talking about model execution here.  The endpoints are distributed
> across the cluster and the storm bolt chooses a service to use (with a bias
> toward using one that is local to that bolt) and the request is made to the
> endpoint, which scores the input and returns the response.
>
> This makes sense. Depending on volume and velocity of data seems like this
> could get expensive.,
>
>
> RE: Model service, if that term means what I think it means, is almost
> entirely done inside of zookeeper.  For clarity, I'm talking about service
> discovery (bolt discovers which endpoints serve which models) and model
> updates
>
> Thanks this helps to clarify it quite a bit.  Considering both the storm
> bolts and the model service will be deployed on Yarn, could the bolts not
> use the Yarn registry to identify which model service to connect to before
> making a request?
>
> How do you scale the model service endpoints if they have a preference for
> which model they serve? And each is a simple REST (or another more
> performant protocol) service as the document describes?
>
>
>
> Thanks,
> Andrew
>
> On Thu, Jul 7, 2016 at 1:51 PM, Casey Stella <cestella@gmail.com> wrote:
>
> > Great questions Andrew.  Thanks for the interest. :)
> >
> > RE:: "which is why there would be a caching layer set in front of it at
> the
> > Storm bolt level"
> >
> > Right now we have a LRU caching layer in front of the HBase enrichment
> > adapters, so it would work similarly.  You can imagine, the range of
> inputs
> > is likely not perfectly random, so it's reasonable for the cache to have
> a
> > non-empty working set.  Take for instance a DGA model; the input would
> be a
> > domain and most organizations will have an uneven distribution of domains
> > they access with a heavy skew toward a small number.
> >
> > RE: In this scenario, you can at least scale out via load balancing (i.e.
> > multiple model services serving the same model) since the models are
> > immutable.
> >
> > I am talking about model execution here.  The endpoints are distributed
> > across the cluster and the storm bolt chooses a service to use (with a
> bias
> > toward using one that is local to that bolt) and the request is made to
> the
> > endpoint, which scores the input and returns the response.
> >
> > Model service, if that term means what I think it means, is almost
> entirely
> > done inside of zookeeper.  For clarity, I'm talking about service
> discovery
> > (bolt discovers which endpoints serve which models) and model updates.
> We
> > are not sending the model around to any bolts or any such thing, just for
> > clarity sake.
> >
> >
> >
> > On Thu, Jul 7, 2016 at 9:47 AM, Andrew Psaltis <psaltis.andrew@gmail.com
> >
> > wrote:
> >
> > > Thanks Casey! Couple of quick questions.
> > >
> > > RE:: "which is why there would be a caching layer set in front of it at
> > the
> > > Storm bolt level"
> > > Hmm, would this be of the results of model execution? Would this really
> > > work when each tuple may contain totally different data? Or is the
> > caching
> > > going to be smart enough that it will look at all the data passed in
> and
> > > determine that an identical tuple has already been evaluated so serve
> the
> > > result out of cache?
> > >
> > > RE: "Also, we would prefer local instances of the service when and
> where
> > > possible"
> > > Perfect makes sense.
> > >
> > > RE: Serving many models from every storm bolt is also fairly expensive.
> > > I can see how it could be, but couldn't  we can make sure that not all
> > > models live in every bolt?
> > >
> > > RE: In this scenario, you can at least scale out via load balancing
> (i.e.
> > > multiple model services serving the same model) since the models are
> > > immutable.
> > > This seems to address the model serving, not model execution service.
> > > Having yet one more layer to scale and mange also seems like it
> > > would further complicate things. Could we not just also scale the
> bolts?
> > >
> > > Thanks,
> > > Andrew
> > >
> > >
> > >
> > >
> > > On Thu, Jul 7, 2016 at 12:37 PM, Casey Stella <cestella@gmail.com>
> > wrote:
> > >
> > > > So, regarding the expense of communication; I tend to agree that it
> is
> > > > expensive, which is why there would be a caching layer set in front
> of
> > it
> > > > at the Storm bolt level.  Also, we would prefer local instances of
> the
> > > > service when and where possible.  Serving many models from every
> storm
> > > bolt
> > > > is also fairly expensive.  In this scenario, you can at least scale
> out
> > > via
> > > > load balancing (i.e. multiple model services serving the same model)
> > > since
> > > > the models are immutable.
> > > >
> > > > On Thu, Jul 7, 2016 at 9:24 AM, Andrew Psaltis <
> > psaltis.andrew@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > OK that makes sense. So the doc attached to this JIRA[1] just
> speaks
> > to
> > > > the
> > > > > Model serving. Is there a doc for the model service? And by making
> > > this a
> > > > > separate service we are saying that for every
> > “MODEL_APPLY(model_name,
> > > > > param_1, param_2, …, param_n)” we are potentially going to go
> across
> > > the
> > > > > wire and have a model executed? That seems pretty expensive, no?
> > > > >
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/METRON-265
> > > > >
> > > > > On Thu, Jul 7, 2016 at 12:20 PM, Casey Stella <cestella@gmail.com>
> > > > wrote:
> > > > >
> > > > > > The "REST" model service, which I place in quotes because there
> is
> > > some
> > > > > > strong discussion about whether REST is a reasonable transport
> for
> > > > this,
> > > > > is
> > > > > > responsible for providing the model.  The scoring/model
> application
> > > > > happens
> > > > > > in the model service and the results get transferred back to
the
> > > storm
> > > > > bolt
> > > > > > that calls it.
> > > > > >
> > > > > > Casey
> > > > > >
> > > > > > On Thu, Jul 7, 2016 at 9:17 AM, Andrew Psaltis <
> > > > psaltis.andrew@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Trying to make sure I grok this thread and the word doc
> attached
> > to
> > > > the
> > > > > > > JIRA. The word doc and JIRA speak to a Model Service Service
> and
> > > that
> > > > > the
> > > > > > > REST service will be responsible for serving up models.
> However,
> > > part
> > > > > of
> > > > > > > this conversation seems to suggest that the model execution
> will
> > > > > actually
> > > > > > > occur at the REST service .. in particular this comment
from
> > James:
> > > > > > >
> > > > > > > "There are several reasons to decouple model execution
from
> > Storm:"
> > > > > > >
> > > > > > > If the model execution is decoupled from Storm then it
appears
> > that
> > > > the
> > > > > > > REST service will be executing the model, not just serving
it
> up,
> > > is
> > > > > that
> > > > > > > correct?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Andrew
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jul 7, 2016 at 11:51 AM, Casey Stella <
> > cestella@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Regarding the performance of REST:
> > > > > > > >
> > > > > > > > Yep, so everyone seems to be worried about the performance
> > > > > implications
> > > > > > > for
> > > > > > > > REST.  I made this comment on the JIRA, but I'll repeat
it
> here
> > > for
> > > > > > > broader
> > > > > > > > discussion:
> > > > > > > >
> > > > > > > > My choice of REST was mostly due to the fact that
I want to
> > > support
> > > > > > > > > multi-language (I think that's a very important
> requirement)
> > > and
> > > > > > there
> > > > > > > > are
> > > > > > > > > REST libraries for pretty much everything. I
do agree,
> > however,
> > > > > that
> > > > > > > JSON
> > > > > > > > > transport can get chunky. How about a compromise
and use
> > REST,
> > > > but
> > > > > > the
> > > > > > > > > input and output payloads for scoring are Maps
encoded in
> > > msgpack
> > > > > > > rather
> > > > > > > > > than JSON. There is a msgpack library for pretty
much every
> > > > > language
> > > > > > > out
> > > > > > > > > there (almost) and certainly all of the ones
we'd like to
> > > target.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > The other option is to just create and expose
protobuf
> > bindings
> > > > > > (thrift
> > > > > > > > > doesn't have a native client for R) for all of
the
> languages
> > > that
> > > > > we
> > > > > > > want
> > > > > > > > > to support. I'm perfectly fine with that, but
I had some
> > > worries
> > > > > > about
> > > > > > > > the
> > > > > > > > > maturity of the bindings.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > The final option, as you suggest, is to just
use raw
> > sockets. I
> > > > > think
> > > > > > > if
> > > > > > > > > we went that route, we might have to create a
layer for
> each
> > > > > language
> > > > > > > > > rather than relying on model creators to create
a TCP
> > server. I
> > > > > > thought
> > > > > > > > > that might be a bit onerous for a MVP.
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Given the discussion, though, what it has made
me aware of
> is
> > > > that
> > > > > we
> > > > > > > > > might not want to dictate a transport mechanism
at all, but
> > > > rather
> > > > > > > allow
> > > > > > > > > that to be pluggable and extensible (so each
model would be
> > > > > > associated
> > > > > > > > with
> > > > > > > > > a transport mechanism handler that would know
how to
> > > communicate
> > > > to
> > > > > > it.
> > > > > > > > We
> > > > > > > > > would provide default mechanisms for msgpack
over REST,
> JSON
> > > over
> > > > > > REST
> > > > > > > > and
> > > > > > > > > maybe msgpack over raw TCP.) Thoughts?
> > > > > > > >
> > > > > > > >
> > > > > > > > Regarding PMML:
> > > > > > > >
> > > > > > > > I tend to agree with James that PMML is too restrictive
as to
> > > > models
> > > > > it
> > > > > > > can
> > > > > > > > represent and I have not had great experiences with
it in
> > > > production.
> > > > > > > > Also, the open source libraries for PMML have licensing
> issues
> > > > (jpmml
> > > > > > > > requires an older version to accommodate our licensing
> > > > requirements).
> > > > > > > >
> > > > > > > > Regarding workflow:
> > > > > > > >
> > > > > > > > At the moment, I'd like to focus on getting a generalized
> > > > > > infrastructure
> > > > > > > > for model scoring and updating put in place.   This
means,
> this
> > > > > > > > architecture takes up the baton from the point when
a model
> is
> > > > > > > > trained/created.  Also, I have attempted to be generic
in
> terms
> > > of
> > > > > > output
> > > > > > > > of the model (a map of results) so it can fit any
type of
> model
> > > > that
> > > > > I
> > > > > > > can
> > > > > > > > think of.  If that's not the case, let me know, though.
> > > > > > > >
> > > > > > > > For instance, for clustering, you would probably emit
the
> > cluster
> > > > id
> > > > > > > > associated with the input and that would be added
to the
> > message
> > > as
> > > > > it
> > > > > > > > passes through the storm topology.  The model is responsible
> > for
> > > > > > > processing
> > > > > > > > the input and constructing properly formed output.
> > > > > > > >
> > > > > > > > Casey
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Jul 5, 2016 at 3:45 PM, Debo Dutta (dedutta)
<
> > > > > > dedutta@cisco.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Following up on the thread a little late ….
Awesome start
> > > Casey.
> > > > > Some
> > > > > > > > > comments:
> > > > > > > > > * Model execution
> > > > > > > > > ** I am guessing the model execution will be
on YARN only
> for
> > > > now.
> > > > > > This
> > > > > > > > is
> > > > > > > > > fine, but the REST call could have an overhead
- depends on
> > the
> > > > > > speed.
> > > > > > > > > * PMML: won’t we have to choose some DSL for
describing
> > models?
> > > > > > > > > * Model:
> > > > > > > > > ** workflow vs a model -  do we care about the
“workflow"
> > that
> > > > > leads
> > > > > > to
> > > > > > > > > the models or just the “model"? For example,
we might start
> > > with
> > > > n
> > > > > > > > features
> > > > > > > > > —> do feature selection to choose k (or
apply a transform
> > > > function)
> > > > > > —>
> > > > > > > > > apply a model etc
> > > > > > > > > * Use cases - I can see this working for n-ary
> classification
> > > > style
> > > > > > > > models
> > > > > > > > > easily. Will the same mechanism be used for stuff
like
> > > clustering
> > > > > (or
> > > > > > > > > intermediate steps like feature selection alone).
> > > > > > > > >
> > > > > > > > > Thx
> > > > > > > > > debo
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 7/5/16, 3:24 PM, "James Sirota" <jsirota@apache.org>
> > wrote:
> > > > > > > > >
> > > > > > > > > >Simon,
> > > > > > > > > >
> > > > > > > > > >There are several reasons to decouple model
execution from
> > > > Storm:
> > > > > > > > > >
> > > > > > > > > >- Reliability: It's much easier to handle
a failed service
> > > than
> > > > a
> > > > > > > failed
> > > > > > > > > bolt.  You can also troubleshoot without having
to bring
> down
> > > the
> > > > > > > > topology
> > > > > > > > > >- Complexity: you de-couple the model logic
from Storm
> logic
> > > and
> > > > > can
> > > > > > > > > manage it independently of Storm
> > > > > > > > > >- Portability: you can swap the model guts
(switch from
> > Spark
> > > to
> > > > > > > Flink,
> > > > > > > > > etc) and as long as you maintain the interface
you are good
> > to
> > > go
> > > > > > > > > >- Consistency: since we want to expose our
models the same
> > way
> > > > we
> > > > > > > expose
> > > > > > > > > threat intel then it makes sense to expose them
as a
> service
> > > > > > > > > >
> > > > > > > > > >In our vision for Metron we want to make
it easy to uptake
> > and
> > > > > share
> > > > > > > > > models.  I think well-defined interfaces and
programmatic
> > ways
> > > of
> > > > > > > > > deployment, lifecycle management, and scoring
via
> > well-defined
> > > > REST
> > > > > > > > > interfaces will make this task easier.  We can
do a few
> > things
> > > to
> > > > > > > > > >
> > > > > > > > > >With respect to PMML I personally had not
had much luck
> with
> > > it
> > > > in
> > > > > > > > > production.  I would prefer models as POJOs.
> > > > > > > > > >
> > > > > > > > > >Thanks,
> > > > > > > > > >James
> > > > > > > > > >
> > > > > > > > > >04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
> > > > > > > > > >> Since the models' parameters and execution
algorithm are
> > > > likely
> > > > > to
> > > > > > > be
> > > > > > > > > small, why not have the model store push the
model changes
> > and
> > > > > > scoring
> > > > > > > > > direct to the bolts and execute within storm.
This negates
> > the
> > > > > > overhead
> > > > > > > > of
> > > > > > > > > a rest call to the model server, and the need
for discovery
> > of
> > > > the
> > > > > > > model
> > > > > > > > > server in zookeeper.
> > > > > > > > > >>
> > > > > > > > > >> Something like the way ranger policies
are updated /
> > cached
> > > in
> > > > > > > plugins
> > > > > > > > > would seem to make sense, so that we're distributing
the
> > model
> > > > > > > execution
> > > > > > > > > directly into the enrichment pipeline rather
than
> collecting
> > > in a
> > > > > > > central
> > > > > > > > > service.
> > > > > > > > > >>
> > > > > > > > > >> This would work with simple models on
single events, but
> > may
> > > > > > > struggle
> > > > > > > > > with correlation based models. However, those
could be
> > handled
> > > in
> > > > > > storm
> > > > > > > > by
> > > > > > > > > pushing into a windowing trident topology or
something of
> the
> > > > sort,
> > > > > > or
> > > > > > > > even
> > > > > > > > > with a parallel spark streaming job using the
same method
> of
> > > > > > > distributing
> > > > > > > > > models.
> > > > > > > > > >>
> > > > > > > > > >> The real challenge here would be stateful
online models,
> > > which
> > > > > > seem
> > > > > > > > > like a minority case which could be handled by
a shared
> state
> > > > store
> > > > > > > such
> > > > > > > > as
> > > > > > > > > HBase.
> > > > > > > > > >>
> > > > > > > > > >> You still keep the ability to run different
languages,
> and
> > > > > > > platforms,
> > > > > > > > > but wrap managing the parallelism in storm bolts
rather
> than
> > > yarn
> > > > > > > > > containers.
> > > > > > > > > >>
> > > > > > > > > >> We could also consider basing the model
protocol on a a
> > > common
> > > > > > model
> > > > > > > > > language like pmml, thong that is likely to be
highly
> > limiting.
> > > > > > > > > >>
> > > > > > > > > >> Simon
> > > > > > > > > >>
> > > > > > > > > >>>  On 4 Jul 2016, at 22:35, Casey
Stella <
> > cestella@gmail.com
> > > >
> > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>>  This is great! I'll capture any
requirements that
> anyone
> > > > wants
> > > > > > to
> > > > > > > > > >>>  contribute and ensure that the
proposed architecture
> > > > > > accommodates
> > > > > > > > > them. I
> > > > > > > > > >>>  think we should focus on a minimal
set of requirements
> > and
> > > > an
> > > > > > > > > architecture
> > > > > > > > > >>>  that does not preclude a larger
set. I have found that
> > the
> > > > > best
> > > > > > > > > driver of
> > > > > > > > > >>>  requirements are installed users.
:)
> > > > > > > > > >>>
> > > > > > > > > >>>  For instance, I think a lot of
questions about how
> often
> > > to
> > > > > > > update a
> > > > > > > > > model
> > > > > > > > > >>>  and such should be represented
in the architecture by
> > the
> > > > > > ability
> > > > > > > to
> > > > > > > > > >>>  manually update a model, so as
long as we have the
> > ability
> > > > to
> > > > > > > > update,
> > > > > > > > > >>>  people can choose when and where
to do it (i.e. time
> > based
> > > > or
> > > > > > some
> > > > > > > > > other
> > > > > > > > > >>>  trigger). That being said, we don't
want to cause too
> > much
> > > > > > effort
> > > > > > > > for
> > > > > > > > > the
> > > > > > > > > >>>  user if we can avoid it with features.
> > > > > > > > > >>>
> > > > > > > > > >>>  In terms of the questions laid
out, here are the
> > > constraints
> > > > > > from
> > > > > > > > the
> > > > > > > > > >>>  proposed architecture as I see
them. It'd be great to
> > get
> > > a
> > > > > > sense
> > > > > > > of
> > > > > > > > > >>>  whether these constraints are too
onerous or where
> > they're
> > > > not
> > > > > > > > > opinionated
> > > > > > > > > >>>  enough :
> > > > > > > > > >>>
> > > > > > > > > >>>    - Model versioning and retention
> > > > > > > > > >>>    - We do have the ability to update
models, but the
> > > > training
> > > > > > and
> > > > > > > > > decision
> > > > > > > > > >>>       of when to update the model
is left up to the
> user.
> > > We
> > > > > may
> > > > > > > want
> > > > > > > > > to think
> > > > > > > > > >>>       deeply about when and where
automated model
> updates
> > > can
> > > > > fit
> > > > > > > > > >>>       - Also, retention is currently
manual. It might
> be
> > an
> > > > > > easier
> > > > > > > > win
> > > > > > > > > to
> > > > > > > > > >>>       set up policies around when
to sunset models
> (after
> > > > newer
> > > > > > > > > versions are
> > > > > > > > > >>>       added, for instance).
> > > > > > > > > >>>    - Model access controls management
> > > > > > > > > >>>    - The architecture proposes no
constraints around
> > this.
> > > As
> > > > > it
> > > > > > > > stands
> > > > > > > > > >>>       now, models are held in HDFS,
so it would inherit
> > the
> > > > > same
> > > > > > > > > security
> > > > > > > > > >>>       capabilities from that (user/group
permissions +
> > > > Ranger,
> > > > > > etc)
> > > > > > > > > >>>    - Requirements around concept
drift
> > > > > > > > > >>>    - I'd love to hear user requirements
around how we
> > could
> > > > > > > > > automatically
> > > > > > > > > >>>       address concept drift. The
architecture as it's
> > > > proposed
> > > > > > > let's
> > > > > > > > > the user
> > > > > > > > > >>>       decide when to update models.
> > > > > > > > > >>>    - Requirements around model output
> > > > > > > > > >>>    - The architecture as it stands
just mandates a JSON
> > map
> > > > > input
> > > > > > > and
> > > > > > > > > JSON
> > > > > > > > > >>>       map output, so it's up to
the model what they
> want
> > to
> > > > > pass
> > > > > > > > back.
> > > > > > > > > >>>       - It's also up to the model
to document its own
> > > output.
> > > > > > > > > >>>    - Any model audit and logging
requirements
> > > > > > > > > >>>    - The architecture proposes no
constraints around
> > this.
> > > > I'd
> > > > > > love
> > > > > > > > to
> > > > > > > > > see
> > > > > > > > > >>>       community guidance around
this. As it stands, we
> > just
> > > > log
> > > > > > > using
> > > > > > > > > the same
> > > > > > > > > >>>       mechanism as any YARN application.
> > > > > > > > > >>>    - What model metrics need to
be exposed
> > > > > > > > > >>>    - The architecture proposes no
constraints around
> > this.
> > > > I'd
> > > > > > love
> > > > > > > > to
> > > > > > > > > see
> > > > > > > > > >>>       community guidance around
this.
> > > > > > > > > >>>       - Requirements around failure
modes
> > > > > > > > > >>>    - We briefly touch on this in
the document, but it
> is
> > > > > probably
> > > > > > > not
> > > > > > > > > >>>       complete. Service endpoint
failure will result in
> > > > > > > blacklisting
> > > > > > > > > from a
> > > > > > > > > >>>       storm bolt perspective and
node failure should
> > result
> > > > in
> > > > > a
> > > > > > > new
> > > > > > > > > container
> > > > > > > > > >>>       being started by the Yarn
application master.
> > Beyond
> > > > > that,
> > > > > > > the
> > > > > > > > > >>>       architecture isn't explicit.
> > > > > > > > > >>>
> > > > > > > > > >>>>  On Mon, Jul 4, 2016 at 1:49
PM, James Sirota <
> > > > > > jsirota@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>>>
> > > > > > > > > >>>>  I left a comment on the JIRA.
I think your design is
> > > > > promising.
> > > > > > > One
> > > > > > > > > >>>>  other thing I would suggest
is for us to crowd source
> > > > > > > requirements
> > > > > > > > > around
> > > > > > > > > >>>>  model management. Specifically:
> > > > > > > > > >>>>
> > > > > > > > > >>>>  Model versioning and retention
> > > > > > > > > >>>>  Model access controls management
> > > > > > > > > >>>>  Requirements around concept
drift
> > > > > > > > > >>>>  Requirements around model output
> > > > > > > > > >>>>  Any model audit and logging
requirements
> > > > > > > > > >>>>  What model metrics need to
be exposed
> > > > > > > > > >>>>  Requirements around failure
modes
> > > > > > > > > >>>>
> > > > > > > > > >>>>  03.07.2016, 14:00, "Casey Stella"
<
> cestella@gmail.com
> > >:
> > > > > > > > > >>>>>  Hi all,
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  I think we are at the point
where we should try to
> > > tackle
> > > > > > Model
> > > > > > > > as a
> > > > > > > > > >>>>>  service for Metron. As
such, I created a JIRA and
> > > proposed
> > > > > an
> > > > > > > > > >>>>  architecture
> > > > > > > > > >>>>>  for accomplishing this
within Metron.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  My inclination is to be
data science
> language/library
> > > > > agnostic
> > > > > > > and
> > > > > > > > > to
> > > > > > > > > >>>>>  provide a general purpose
REST infrastructure for
> > > managing
> > > > > and
> > > > > > > > > serving
> > > > > > > > > >>>>>  models trained on historical
data captured from
> > Metron.
> > > > The
> > > > > > > > > assumption is
> > > > > > > > > >>>>>  that we are within the
hadoop ecosystem, so:
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>    - Models stored on HDFS
> > > > > > > > > >>>>>    - REST Model Services
resource-managed via Yarn
> > > > > > > > > >>>>>    - REST Model Services
discovered via Zookeeper.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  I would really appreciate
community comment on the
> > JIRA
> > > (
> > > > > > > > > >>>>>  https://issues.apache.org/jira/browse/METRON-265).
> > The
> > > > > > proposed
> > > > > > > > > >>>>>  architecture is attached
as a document to that JIRA.
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  I look forward to feedback!
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  Best,
> > > > > > > > > >>>>>
> > > > > > > > > >>>>>  Casey
> > > > > > > > > >>>>
> > > > > > > > > >>>>  -------------------
> > > > > > > > > >>>>  Thank you,
> > > > > > > > > >>>>
> > > > > > > > > >>>>  James Sirota
> > > > > > > > > >>>>  PPMC- Apache Metron (Incubating)
> > > > > > > > > >>>>  jsirota AT apache DOT org
> > > > > > > > > >
> > > > > > > > > >-------------------
> > > > > > > > > >Thank you,
> > > > > > > > > >
> > > > > > > > > >James Sirota
> > > > > > > > > >PPMC- Apache Metron (Incubating)
> > > > > > > > > >jsirota AT apache DOT org
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Thanks,
> > > > > > > Andrew
> > > > > > >
> > > > > > > Subscribe to my book: Streaming Data <
> http://manning.com/psaltis
> > >
> > > > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > > > > twiiter: @itmdata <
> > > > http://twitter.com/intent/user?screen_name=itmdata>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Andrew
> > > > >
> > > > > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > > > twiiter: @itmdata <
> > http://twitter.com/intent/user?screen_name=itmdata>
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Andrew
> > >
> > > Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> > > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> > > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
> > >
> >
>
>
>
> --
> Thanks,
> Andrew
>
> Subscribe to my book: Streaming Data <http://manning.com/psaltis>
> <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306>
> twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message