spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Fregly <ch...@fregly.com>
Subject Re: Can we use spark inside a web service?
Date Thu, 10 Mar 2016 22:46:27 GMT
you are correct, mark.  i misspoke.  apologies for the confusion.

so the problem is even worse given that a typical job requires multiple
tasks/cores.

i have yet to see this particular architecture work in production.  i would
love for someone to prove otherwise.

On Thu, Mar 10, 2016 at 5:44 PM, Mark Hamstra <mark@clearstorydata.com>
wrote:

> For example, if you're looking to scale out to 1000 concurrent requests,
>> this is 1000 concurrent Spark jobs.  This would require a cluster with 1000
>> cores.
>
>
> This doesn't make sense.  A Spark Job is a driver/DAGScheduler concept
> without any 1:1 correspondence between Worker cores and Jobs.  Cores are
> used to run Tasks, not Jobs.  So, yes, a 1000 core cluster can run at most
> 1000 simultaneous Tasks, but that doesn't really tell you anything about
> how many Jobs are or can be concurrently tracked by the DAGScheduler, which
> will be apportioning the Tasks from those concurrent Jobs across the
> available Executor cores.
>
> On Thu, Mar 10, 2016 at 2:00 PM, Chris Fregly <chris@fregly.com> wrote:
>
>> Good stuff, Evan.  Looks like this is utilizing the in-memory
>> capabilities of FiloDB which is pretty cool.  looking forward to the
>> webcast as I don't know much about FiloDB.
>>
>> My personal thoughts here are to removed Spark from the user
>> request/response hot path.
>>
>> I can't tell you how many times i've had to unroll that architecture at
>> clients - and replace with a real database like Cassandra, ElasticSearch,
>> HBase, MySql.
>>
>> Unfortunately, Spark - and Spark Streaming, especially - lead you to
>> believe that Spark could be used as an application server.  This is not a
>> good use case for Spark.
>>
>> Remember that every job that is launched by Spark requires 1 CPU core,
>> some memory, and an available Executor JVM to provide the CPU and memory.
>>
>> Yes, you can horizontally scale this because of the distributed nature of
>> Spark, however it is not an efficient scaling strategy.
>>
>> For example, if you're looking to scale out to 1000 concurrent requests,
>> this is 1000 concurrent Spark jobs.  This would require a cluster with 1000
>> cores.  this is just not cost effective.
>>
>> Use Spark for what it's good for - ad-hoc, interactive, and iterative
>> (machine learning, graph) analytics.  Use an application server for what
>> it's good - managing a large amount of concurrent requests.  And use a
>> database for what it's good for - storing/retrieving data.
>>
>> And any serious production deployment will need failover, throttling,
>> back pressure, auto-scaling, and service discovery.
>>
>> While Spark supports these to varying levels of production-readiness,
>> Spark is a batch-oriented system and not meant to be put on the user
>> request/response hot path.
>>
>> For the failover, throttling, back pressure, autoscaling that i mentioned
>> above, it's worth checking out the suite of Netflix OSS - particularly
>> Hystrix, Eureka, Zuul, Karyon, etc:  http://netflix.github.io/
>>
>> Here's my github project that incorporates a lot of these:
>> https://github.com/cfregly/fluxcapacitor
>>
>> Here's a netflix Skunkworks github project that packages these up in
>> Docker images:  https://github.com/Netflix-Skunkworks/zerotodocker
>>
>>
>> On Thu, Mar 10, 2016 at 1:40 PM, velvia.github <velvia.github@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I just wrote a blog post which might be really useful to you -- I have
>>> just
>>> benchmarked being able to achieve 700 queries per second in Spark.  So,
>>> yes,
>>> web speed SQL queries are definitely possible.   Read my new blog post:
>>>
>>> http://velvia.github.io/Spark-Concurrent-Fast-Queries/
>>>
>>> and feel free to email me (at velvia@gmail.com) if you would like to
>>> follow
>>> up.
>>>
>>> -Evan
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Can-we-use-spark-inside-a-web-service-tp26426p26451.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>>
>> --
>>
>> *Chris Fregly*
>> Principal Data Solutions Engineer
>> IBM Spark Technology Center, San Francisco, CA
>> http://spark.tc | http://advancedspark.com
>>
>
>


-- 

*Chris Fregly*
Principal Data Solutions Engineer
IBM Spark Technology Center, San Francisco, CA
http://spark.tc | http://advancedspark.com

Mime
View raw message