spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Asher Krim <ak...@hubspot.com>
Subject Spark Local Pipelines
Date Sun, 12 Mar 2017 22:15:14 GMT
Hi All,

I spent a lot of time at Spark Summit East this year talking with Spark
developers and committers about challenges with productizing Spark. One of
the biggest shortcomings I've encountered in Spark ML pipelines is the lack
of a way to serve single requests with any reasonable performance.
SPARK-10413 explores adding methods for single item prediction, but I'd
like to explore a more holistic approach - a separate local api, with
models that support transformations without depending on Spark at all.

I've written up a doc
<https://docs.google.com/document/d/1Ha4DRMio5A7LjPqiHUnwVzbaxbev6ys04myyz6nDgI4/edit?usp=sharing>
detailing the approach, and I'm happy to discuss alternatives. If this
gains traction, I can create a branch with a minimal example on a simple
transformer (probably something like CountVectorizerModel) so we have
something concrete to continue the discussion on.

Thanks,
Asher Krim
Senior Software Engineer

Mime
View raw message