spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1526) Running spark driver program from my local machine
Date Fri, 23 Jan 2015 12:36:34 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14289190#comment-14289190
] 

Sean Owen commented on SPARK-1526:
----------------------------------

This may be a little bold in closing, but there's been no activity and I do not see an actionable
change here, but I think there is a fine workaround for this case. Yes it's a pretty fundamental
property of Spark that the driver communicates a lot with the executors, and I can't see that
changing. You can of course run the driver remotely; it's a matter of network config, and
having enough network bandwidth to support however much communication your driver/executors
need, which is not necessarily a lot. Finally, you can access resources like DBs from your
executors too, of course. In fact that is probably more sensible than loading to the driver,
then copying again to executors.

> Running spark driver program from my local machine
> --------------------------------------------------
>
>                 Key: SPARK-1526
>                 URL: https://issues.apache.org/jira/browse/SPARK-1526
>             Project: Spark
>          Issue Type: Wish
>          Components: Spark Core
>            Reporter: Idan Zalzberg
>
> Currently it seems that the design choice is that the driver program should be close
network-wise to the worker and allow connections to be created from either side.
> This makes using Spark somewhat harder since when I develop locally I not only to package
all my program, but also all it's local dependencies.
> let's say I have a local DB with names of files in HADOOP that I want to process with
spark, now I need my local DB to be accessible from the cluster so it can fetch the file names
in runtime.
> The driver program is an awesome thing, but it loses some of it's strength if you can't
really run it anywhere.
> It seems to me that the problem is with the DAGScheduler that needs to be close to the
worker, maybe it shouldn't be embedded in the driver then?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message