spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Pentreath <>
Subject Re: Velox Model Server
Date Mon, 13 Jul 2015 11:33:25 GMT
Honestly I don't believe this kind of functionality belongs within

For serving of factor-type models, you are typically in the realm of
recommendations or ad-serving scenarios - i.e. needing to score a user /
context against many possible items and return a top-k list of those. In
addition, filtering and search comes into play heavily - e.g. filter
recommendations by item category, by geo-location, by stock-level status,
by price / profit levels, by promoted / blocked content, etc etc. And the
requirements are typically real-time (i.e. a few hundred ms at the most).
So I think there are too many specialist requirements vs spark-jobserver.

In terms of general approach, your options are to:

(a) score first to get a list recs, and then filter / re-rank / apply
queries to further winnow that down. This typically means returning the top
L > K recs from the scoring, so that you have enough left after filtering.

(b) score and filter in the same step (or at least using the same "engine").

Scoring is really the easiest part - modulo dealing with massive item-sets
which can be dealt with by (i) LSH / approx. nearest neighbour approaches
and/or (ii) partitioning / parallelization approaches.

We currently use approach (a) but are looking into (b) also.

I think the best idea is to pick one of the existing frameworks (whether
Oryx, Velox, PredictionIO, SeldonIO etc) that best suits your requirements,
and build around that. Or build something new of your own if you want to
use Akka.

I went with Scalatra because it is what our API layer is built with, and I
find the routing DSL much nicer vs Spray. Both have good Akka integration.
I don't think the front-end matters that much (Finatra is another option).
If you want Akka then maybe Spray / Akka HTTP is the best way to go.

Our current model server as I mentioned is very basic and a bit hacked
together, but it may have some useful ideas or serve as a starting point if
there is interest.

On Wed, Jun 24, 2015 at 5:46 PM, Debasish Das <>

> Thanks Nick, Sean for the great suggestions...
> Since you guys have already hit these issues before I think it will be
> great if we can add the learning to Spark Job Server and enhance it for
> community.
> Nick, do you see any major issues in using Spray over Scalatra ?
> Looks like Model Server API layer needs access to a performant KV store
> (Redis/Memcached), Elastisearch (we used Solr before for item->item serving
> but I liked the Spark-Elastisearch integration, REST is Netty based unlike
> Solr's Jetty and YARN client looks more stable and so it is worthwhile to
> see if it improves over Solr based serving) and ML Models (which are moving
> towards Spark SQL style in 1.3/1.4 with the introduction of Pipeline API)
> An initial version of KV store might be simple LRU cache.
> For KV store are there any comparisons available with IndexedRDD and
> Redis/Memcached ?
> Velox is using CoreOS EtcdClient (which is Go based) but I am not sure if
> it is used as a full fledged distributed cache or not. May be it is being
> used as zookeeper alternative.
> On Wed, Jun 24, 2015 at 2:02 AM, Nick Pentreath <>
> wrote:
>> Ok
>> My view is with "only" 100k items, you are better off serving in-memory
>> for items vectors. i.e. store all item vectors in memory, and compute user
>> * item score on-demand. In most applications only a small proportion of
>> users are active, so really you don't need all 10m user vectors in memory.
>> They could be looked up from a K-V store and have an LRU cache in memory
>> for say 1m of those. Optionally also update them as feedback comes in.
>> As far as I can see, this is pretty much what velox does except it
>> partitions all user vectors across nodes to scale.
>> Oryx does almost the same but Oryx1 kept all user and item vectors in
>> memory (though I am not sure about whether Oryx2 still stores all user and
>> item vectors in memory or partitions in some way).
>> Deb, we are using a custom Akka-based model server (with Scalatra
>> frontend). It is more focused on many small models in-memory (largest of
>> these is around 5m user vectors, 100k item vectors, with factor size
>> 20-50). We use Akka cluster sharding to allow scale-out across nodes if
>> required. We have a few hundred models comfortably powered by m3.xlarge AWS
>> instances. Using floats you could probably have all of your factors in
>> memory on one 64GB machine (depending on how many models you have).
>> Our solution is not that generic and a little hacked-together - but I'd
>> be happy to chat offline about sharing what we've done. I think it still
>> has a basic client to the Spark JobServer which would allow triggering
>> re-computation jobs periodically. We currently just run batch
>> re-computation and reload factors from S3 periodically.
>> We then use Elasticsearch to post-filter results and blend content-based
>> stuff - which I think might be more efficient than SparkSQL for this
>> particular purpose.
>> On Wed, Jun 24, 2015 at 8:59 AM, Debasish Das <>
>> wrote:
>>> Model sizes are 10m x rank, 100k x rank range.
>>> For recommendation/topic modeling I can run batch recommendAll and then
>>> keep serving the model using a distributed cache but then I can't
>>> incorporate per user model re-predict if user feedback is making the
>>> current topk stale. I have to wait for next batch refresh which might be 1
>>> hr away.
>>> spark job server + spark sql can get me fresh updates but each time
>>> running a predict might be slow.
>>> I am guessing the better idea might be to start with batch recommendAll
>>> and then update the per user model if it get stale but that needs acess to
>>> the key value store and the model over a API like spark job server. I am
>>> running experiments with job server. In general it will be nice if my key
>>> value store and model are both managed by same akka based API.
>>> Yes sparksql is to filter/boost recommendation results using business
>>> logic like user demography for example..
>>> On Jun 23, 2015 2:07 AM, "Sean Owen" <> wrote:
>>>> Yes, and typically needs are <100ms. Now imagine even 10 concurrent
>>>> requests. My experience has been that this approach won't nearly
>>>> scale. The best you could probably do is async mini-batch
>>>> near-real-time scoring, pushing results to some store for retrieval,
>>>> which could be entirely suitable for your use case.
>>>> On Tue, Jun 23, 2015 at 8:52 AM, Nick Pentreath
>>>> <> wrote:
>>>> > If your recommendation needs are real-time (<1s) I am not sure job
>>>> server
>>>> > and computing the refs with spark will do the trick (though those new
>>>> > BLAS-based methods may have given sufficient speed up).

View raw message