spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <>
Subject Re: Spark for core business-logic? - Replacing: MongoDB?
Date Mon, 05 Jan 2015 05:24:05 GMT

It really depends on your requirements, what kind of machine learning
algorithm your budget, if you do currently something really new or
integrate it with an existing application, etc.. You can run MongoDB as
well as a cluster. I don't think this question can be answered generally,
but depends on details of your case.

Best regards
Le 4 janv. 2015 01:44, "Alec Taylor" <> a écrit :

> In the middle of doing the architecture for a new project, which has
> various machine learning and related components, including:
> recommender systems, search engines and sequence [common intersection]
> matching.
> Usually I use: MongoDB (as db), Redis (as cache) and celery (as queue,
> backed by Redis).
> Though I don't have experience with Hadoop, I was thinking of using
> Hadoop for the machine-learning (as this will become a Big Data
> problem quite quickly). To push the data into Hadoop, I would use a
> connector of some description, or push the MongoDB backups into HDFS
> at set intervals.
> However I was thinking that it might be better to put the whole thing
> in Hadoop, store all persistent data in Hadoop, and maybe do all the
> layers in Apache Spark (with caching remaining in Redis).
> Is that a viable option? - Most of what I see discusses Spark (and
> Hadoop in general) for analytics only. Apache Phoenix exposes a nice
> interface for read/write over HBase, so I might use that if Spark ends
> up being the wrong solution.
> Thanks for all suggestions,
> Alec Taylor
> PS: I need this for both "Big" and "Small" data. Note that I am using
> the Cloudera definition of "Big Data" referring to processing/storage
> across more than 1 machine.
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message