spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Can we use spark inside a web service?
Date Thu, 10 Mar 2016 22:00:27 GMT
Good stuff, Evan.  Looks like this is utilizing the in-memory capabilities
of FiloDB which is pretty cool.  looking forward to the webcast as I don't
know much about FiloDB.

My personal thoughts here are to removed Spark from the user
request/response hot path.

I can't tell you how many times i've had to unroll that architecture at
clients - and replace with a real database like Cassandra, ElasticSearch,
HBase, MySql.

Unfortunately, Spark - and Spark Streaming, especially - lead you to
believe that Spark could be used as an application server.  This is not a
good use case for Spark.

Remember that every job that is launched by Spark requires 1 CPU core, some
memory, and an available Executor JVM to provide the CPU and memory.

Yes, you can horizontally scale this because of the distributed nature of
Spark, however it is not an efficient scaling strategy.

For example, if you're looking to scale out to 1000 concurrent requests,
this is 1000 concurrent Spark jobs.  This would require a cluster with 1000
cores.  this is just not cost effective.

Use Spark for what it's good for - ad-hoc, interactive, and iterative
(machine learning, graph) analytics.  Use an application server for what
it's good - managing a large amount of concurrent requests.  And use a
database for what it's good for - storing/retrieving data.

And any serious production deployment will need failover, throttling, back
pressure, auto-scaling, and service discovery.

While Spark supports these to varying levels of production-readiness, Spark
is a batch-oriented system and not meant to be put on the user
request/response hot path.

For the failover, throttling, back pressure, autoscaling that i mentioned
above, it's worth checking out the suite of Netflix OSS - particularly
Hystrix, Eureka, Zuul, Karyon, etc:  http://netflix.github.io/

Here's my github project that incorporates a lot of these:
https://github.com/cfregly/fluxcapacitor

Here's a netflix Skunkworks github project that packages these up in Docker
images:  https://github.com/Netflix-Skunkworks/zerotodocker


On Thu, Mar 10, 2016 at 1:40 PM, velvia.github <velvia.github@gmail.com>
wrote:

> Hi,
>
> I just wrote a blog post which might be really useful to you -- I have just
> benchmarked being able to achieve 700 queries per second in Spark.  So,
> yes,
> web speed SQL queries are definitely possible.   Read my new blog post:
>
> http://velvia.github.io/Spark-Concurrent-Fast-Queries/
>
> and feel free to email me (at velvia@gmail.com) if you would like to
> follow
> up.
>
> -Evan
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Can-we-use-spark-inside-a-web-service-tp26426p26451.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Mime
View raw message