spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Taylor Cox <Taylor....@microsoft.com.INVALID>
Subject RE: How to avoid long-running jobs blocking short-running jobs
Date Mon, 05 Nov 2018 21:45:21 GMT
Hi Conner,

What is preventing you from using a cluster model?
I wonder if docker containers could help you here?
A quick internet search yielded Mist: https://github.com/Hydrospheredata/mist Could be useful?

Taylor

-----Original Message-----
From: conner <mitiskysean@gmail.com> 
Sent: Saturday, November 3, 2018 2:04 AM
To: user@spark.apache.org
Subject: How to avoid long-running jobs blocking short-running jobs

Hi,

I use spark cluster to run ETL jobs and analysis computation about the data after elt stage.
The elt jobs can keep running for several hours, but analysis computation is a short-running
job which can finish in a few seconds.
The dilemma I entrapped is that my application runs in a single JVM and can't be a cluster
application, so just one spark context in my application currently. But when the elt jobs
are running, the jobs will occupy all resource including worker executors too long to block
all my analysis computation jobs. 

My solution is to find a good way to divide the spark cluster resource into two. One part
for analysis computation jobs, another for elt jobs. if the part for elt jobs is free, I can
allocate analysis computation jobs to it.
So I want to find a middleware that can support two spark context and it must be embedded
in my application. I do some research on the third party project spark job server. It can
divide spark resource by launching another JVM to run spark context with a specific resource.
these operations are invisible to the upper layer, so it's a good solution for me. But this
project is running in a single JVM  and just support REST API, I can't endure the data transfer
by TCP again which too slow to me. I want to get a result from spark cluster by TCP and give
this result to view layer to show.
Can anybody give me some good suggestion? I shall be so grateful.





--
Sent from: https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fapache-spark-user-list.1001560.n3.nabble.com%2F&amp;data=02%7C01%7CTaylor.Cox%40microsoft.com%7C3f9379c723d64ca988a908d6416b4f7c%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636768326485388503&amp;sdata=h%2BOzv9rIxo%2FYI6xmjFYvEyvcptmDXEBBA%2BDVhngpKsk%3D&amp;reserved=0

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message