spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From onmstester onmstester <onmstes...@zoho.com.INVALID>
Subject Fwd: How to avoid long-running jobs blocking short-running jobs
Date Sat, 03 Nov 2018 09:19:13 GMT
You could have used two separate pools with different weights for ETL and rest jobs, when ETL
pool weights is about 1 and Rest weight is 1000, anytime a Rest Job comes in, it allocate
all the resources. Details: https://spark.apache.org/docs/latest/job-scheduling.html Sent
using Zoho Mail ============ Forwarded message ============ From : conner <mitiskysean@gmail.com>
To : <user@spark.apache.org> Date : Sat, 03 Nov 2018 12:34:01 +0330 Subject : How to
avoid long-running jobs blocking short-running jobs ============ Forwarded message ============
Hi, I use spark cluster to run ETL jobs and analysis computation about the data after elt
stage. The elt jobs can keep running for several hours, but analysis computation is a short-running
job which can finish in a few seconds. The dilemma I entrapped is that my application runs
in a single JVM and can't be a cluster application, so just one spark context in my application
currently. But when the elt jobs are running, the jobs will occupy all resource including
worker executors too long to block all my analysis computation jobs. My solution is to find
a good way to divide the spark cluster resource into two. One part for analysis computation
jobs, another for elt jobs. if the part for elt jobs is free, I can allocate analysis computation
jobs to it. So I want to find a middleware that can support two spark context and it must
be embedded in my application. I do some research on the third party project spark job server.
It can divide spark resource by launching another JVM to run spark context with a specific
resource. these operations are invisible to the upper layer, so it's a good solution for me.
But this project is running in a single JVM and just support REST API, I can't endure the
data transfer by TCP again which too slow to me. I want to get a result from spark cluster
by TCP and give this result to view layer to show. Can anybody give me some good suggestion?
I shall be so grateful. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--------------------------------------------------------------------- To unsubscribe e-mail:
user-unsubscribe@spark.apache.org
Mime
View raw message