spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gino Bustelo <lbust...@gmail.com>
Subject Re: problem about cluster mode of spark 1.0.0
Date Wed, 25 Jun 2014 00:08:33 GMT
Andrew,

Thanks for your answer. It validates our finding. Unfortunately, client mode assumes that
I'm running in a "privilege node". What I mean by privilege is a node that has net access
to all the workers and vice versa. This is a big assumption to make and unreasonable in certain
circumstances 

I much rather have a single point of contact like a job server (like ooyala's) that handles
jar uploading and lifecycles drivers. I think these are basic requirement for standalone clusters.


Gino B.

> On Jun 24, 2014, at 1:32 PM, Andrew Or <andrew@databricks.com> wrote:
> 
> Hi Randy and Gino,
> 
> The issue is that standalone-cluster mode is not officially supported. Please use standalone-client
mode instead, i.e. specify --deploy-mode client in spark-submit, or simply leave out this
config because it defaults to client mode.
> 
> Unfortunately, this is not currently documented anywhere, and the existing explanation
for the distinction between cluster and client modes is highly misleading. In general, cluster
mode means the driver runs on one of the worker nodes, just like the executors. The corollary
is that the output of the application is not forwarded to command that launched the application
(spark-submit in this case), but is accessible instead through the worker logs. In contrast,
client mode means the command that launches the application also launches the driver, while
the executors still run on the worker nodes. This means the spark-submit command also returns
the output of the application. For instance, it doesn't make sense to run the Spark shell
in cluster mode, because the stdin / stdout / stderr will not be redirected to the spark-submit
command.
> 
> If you are hosting your own cluster and can launch applications from within the cluster,
then there is little benefit for launching your application in cluster mode, which is primarily
intended to cut down the latency between the driver and the executors in the first place.
However, if you are still intent on using standalone-cluster mode after all, you can use the
deprecated way of launching org.apache.spark.deploy.Client directly through bin/spark-class.
Note that this is not recommended and only serves as a temporary workaround until we fix standalone-cluster
mode through spark-submit.
> 
> I have filed the relevant issues: https://issues.apache.org/jira/browse/SPARK-2259 and
https://issues.apache.org/jira/browse/SPARK-2260. Thanks for pointing this out, and we will
get to fixing these shortly.
> 
> Best,
> Andrew
> 
> 
> 2014-06-20 6:06 GMT-07:00 Gino Bustelo <lbustelo@gmail.com>:
>> I've found that the jar will be copied to the worker from hdfs fine, but it is not
added to the spark context for you. You have to know that the jar will end up in the driver's
working dir, and so you just add a the file name if the jar to the context in your program.
>> 
>> In your example below, just add to the context "test.jar".
>> 
>> Btw, the context will not have the master URL either, so add that while you are at
it.
>> 
>> This is a big issue. I've posted about it a week ago and no replies. Hopefully it
gets more attention as more people start hitting this. Basically, spark-submit on standalone
cluster with cluster deploy is broken.
>> 
>> Gino B.
>> 
>> > On Jun 20, 2014, at 2:46 AM, randylu <randylu26@gmail.com> wrote:
>> >
>> > in addition, jar file can be copied to driver node automatically.
>> >
>> >
>> >
>> > --
>> > View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/problem-about-cluster-mode-of-spark-1-0-0-tp7982p7984.html
>> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 

Mime
View raw message