spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "陈宇航" <>
Subject Request for submitting Spark jobs in code purely, without jar
Date Thu, 22 Oct 2015 06:43:19 GMT
Hi developers, I've encountered some problem with Spark, and before opening an issue, I'd like
to hear your thoughts.

Currently, if you want to submit a Spark job, you'll need to write the code, make a jar, and
then submit it with spark-submit or org.apache.spark.launcher.SparkLauncher. 

But sometimes, the RDD operation chain is transferred dynamically in code, from SQL or even
GUI. thus it seems either inconvenient or not possible to make a separated jar. Then I tried
something like below:
val conf = new SparkConf().setAppName("Demo").setMaster("yarn-client")val sc = new SparkContext(conf)sc.textFile("").flatMap(_.split("
")).map((_, 1)).reduceByKey(_+_).foreach(println) // A simple word countWhen they are executed,
a Spark job is submitted. However, there are some remaining problems:
1. It doesn't support all deploy modes, such as yarn-cluster.
2. With the "Only 1 SparkContext in 1 JVM" limit, I can not run this twice.
3. It runs within the same process with my code, no child process is created.

Thus, what I wish for is that the problems can be handle by Spark itself, and my request can
be simply described as a "adding submit() method for SparkContext / StreamingContext / SQLContext".
I hope if I added a line after the code above like this:
sc.submit()then Spark can handle all background submitting processing for me.

I already opened an issue before for this demand, but I couldn't make myself clear back then.
So I wrote this email and try to talk to you guys. Please reply if you need further descriptions,
and I'll open a issue for this if you understand my demand and believe that it's something
worth doing.

Thanks a lot.

Yuhang Chen.
View raw message