metron-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Ball <sb...@hortonworks.com>
Subject Re: Metron-265 Model as a Service
Date Tue, 05 Jul 2016 22:46:05 GMT
I would agree with all those goals, just wanted to see if we could take some of the latency
out from the REST point of view. Even with pipelining HTTP could prove a heavy overhead for
every packet going through metron.

Overall though, I’d agree that a storm wrapping does introduce some complexity and rigidity,
but there may be strategies to mitigate this. Storm DRPC allows a more microservice style
encapsulation to an extent, with less overhead than an HTTP call for every packet going through
the scoring. What I was thinking is more a DRPC style topology that loads and wraps dynamic
model code in a bolt, than a bolt you would have to deploy as part of the topology. This gives
you the encapsulation, and the portability, but, taking your point does introduce a risk around
reliability. 

BTW: Agreed with your point about PMML. It gives end users the option to use things like KNIME,
RapidMiner et al, but certainly constrains and adds a lot of cost. Maybe it’s a future addon
for compatibility if anyone cares about those sort of tools.

Just some thoughts. I do like the REST based microservices architecture for a model repository,
hosting and maintenance, my only concern is whether it will cut it in terms of performance
on realtime scoring. 

Simon


> On 5 Jul 2016, at 23:24, James Sirota <jsirota@apache.org> wrote:
> 
> Simon,
> 
> There are several reasons to decouple model execution from Storm:
> 
> - Reliability: It's much easier to handle a failed service than a failed bolt.  You can
also troubleshoot without having to bring down the topology
> - Complexity: you de-couple the model logic from Storm logic and can manage it independently
of Storm
> - Portability: you can swap the model guts (switch from Spark to Flink, etc) and as long
as you maintain the interface you are good to go
> - Consistency: since we want to expose our models the same way we expose threat intel
then it makes sense to expose them as a service
> 
> In our vision for Metron we want to make it easy to uptake and share models.  I think
well-defined interfaces and programmatic ways of deployment, lifecycle management, and scoring
via well-defined REST interfaces will make this task easier.  We can do a few things to 
> 
> With respect to PMML I personally had not had much luck with it in production.  I would
prefer models as POJOs. 
> 
> Thanks,
> James 
> 
> 04.07.2016, 16:07, "Simon Ball" <sball@hortonworks.com>:
>> Since the models' parameters and execution algorithm are likely to be small, why
not have the model store push the model changes and scoring direct to the bolts and execute
within storm. This negates the overhead of a rest call to the model server, and the need for
discovery of the model server in zookeeper.
>> 
>> Something like the way ranger policies are updated / cached in plugins would seem
to make sense, so that we're distributing the model execution directly into the enrichment
pipeline rather than collecting in a central service.
>> 
>> This would work with simple models on single events, but may struggle with correlation
based models. However, those could be handled in storm by pushing into a windowing trident
topology or something of the sort, or even with a parallel spark streaming job using the same
method of distributing models.
>> 
>> The real challenge here would be stateful online models, which seem like a minority
case which could be handled by a shared state store such as HBase.
>> 
>> You still keep the ability to run different languages, and platforms, but wrap managing
the parallelism in storm bolts rather than yarn containers.
>> 
>> We could also consider basing the model protocol on a a common model language like
pmml, thong that is likely to be highly limiting.
>> 
>> Simon
>> 
>>>  On 4 Jul 2016, at 22:35, Casey Stella <cestella@gmail.com> wrote:
>>> 
>>>  This is great! I'll capture any requirements that anyone wants to
>>>  contribute and ensure that the proposed architecture accommodates them. I
>>>  think we should focus on a minimal set of requirements and an architecture
>>>  that does not preclude a larger set. I have found that the best driver of
>>>  requirements are installed users. :)
>>> 
>>>  For instance, I think a lot of questions about how often to update a model
>>>  and such should be represented in the architecture by the ability to
>>>  manually update a model, so as long as we have the ability to update,
>>>  people can choose when and where to do it (i.e. time based or some other
>>>  trigger). That being said, we don't want to cause too much effort for the
>>>  user if we can avoid it with features.
>>> 
>>>  In terms of the questions laid out, here are the constraints from the
>>>  proposed architecture as I see them. It'd be great to get a sense of
>>>  whether these constraints are too onerous or where they're not opinionated
>>>  enough :
>>> 
>>>    - Model versioning and retention
>>>    - We do have the ability to update models, but the training and decision
>>>       of when to update the model is left up to the user. We may want to think
>>>       deeply about when and where automated model updates can fit
>>>       - Also, retention is currently manual. It might be an easier win to
>>>       set up policies around when to sunset models (after newer versions are
>>>       added, for instance).
>>>    - Model access controls management
>>>    - The architecture proposes no constraints around this. As it stands
>>>       now, models are held in HDFS, so it would inherit the same security
>>>       capabilities from that (user/group permissions + Ranger, etc)
>>>    - Requirements around concept drift
>>>    - I'd love to hear user requirements around how we could automatically
>>>       address concept drift. The architecture as it's proposed let's the user
>>>       decide when to update models.
>>>    - Requirements around model output
>>>    - The architecture as it stands just mandates a JSON map input and JSON
>>>       map output, so it's up to the model what they want to pass back.
>>>       - It's also up to the model to document its own output.
>>>    - Any model audit and logging requirements
>>>    - The architecture proposes no constraints around this. I'd love to see
>>>       community guidance around this. As it stands, we just log using the same
>>>       mechanism as any YARN application.
>>>    - What model metrics need to be exposed
>>>    - The architecture proposes no constraints around this. I'd love to see
>>>       community guidance around this.
>>>       - Requirements around failure modes
>>>    - We briefly touch on this in the document, but it is probably not
>>>       complete. Service endpoint failure will result in blacklisting from a
>>>       storm bolt perspective and node failure should result in a new container
>>>       being started by the Yarn application master. Beyond that, the
>>>       architecture isn't explicit.
>>> 
>>>>  On Mon, Jul 4, 2016 at 1:49 PM, James Sirota <jsirota@apache.org>
wrote:
>>>> 
>>>>  I left a comment on the JIRA. I think your design is promising. One
>>>>  other thing I would suggest is for us to crowd source requirements around
>>>>  model management. Specifically:
>>>> 
>>>>  Model versioning and retention
>>>>  Model access controls management
>>>>  Requirements around concept drift
>>>>  Requirements around model output
>>>>  Any model audit and logging requirements
>>>>  What model metrics need to be exposed
>>>>  Requirements around failure modes
>>>> 
>>>>  03.07.2016, 14:00, "Casey Stella" <cestella@gmail.com>:
>>>>>  Hi all,
>>>>> 
>>>>>  I think we are at the point where we should try to tackle Model as a
>>>>>  service for Metron. As such, I created a JIRA and proposed an
>>>>  architecture
>>>>>  for accomplishing this within Metron.
>>>>> 
>>>>>  My inclination is to be data science language/library agnostic and to
>>>>>  provide a general purpose REST infrastructure for managing and serving
>>>>>  models trained on historical data captured from Metron. The assumption
is
>>>>>  that we are within the hadoop ecosystem, so:
>>>>> 
>>>>>    - Models stored on HDFS
>>>>>    - REST Model Services resource-managed via Yarn
>>>>>    - REST Model Services discovered via Zookeeper.
>>>>> 
>>>>>  I would really appreciate community comment on the JIRA (
>>>>>  https://issues.apache.org/jira/browse/METRON-265). The proposed
>>>>>  architecture is attached as a document to that JIRA.
>>>>> 
>>>>>  I look forward to feedback!
>>>>> 
>>>>>  Best,
>>>>> 
>>>>>  Casey
>>>> 
>>>>  -------------------
>>>>  Thank you,
>>>> 
>>>>  James Sirota
>>>>  PPMC- Apache Metron (Incubating)
>>>>  jsirota AT apache DOT org
> 
> ------------------- 
> Thank you,
> 
> James Sirota
> PPMC- Apache Metron (Incubating)
> jsirota AT apache DOT org
> 

Mime
View raw message