spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kelly, Jonathan" <>
Subject Using Spark with a SOCKS proxy
Date Wed, 18 Mar 2015 00:15:16 GMT
I'm trying to figure out how I might be able to use Spark with a SOCKS proxy.  That is, my
dream is to be able to write code in my IDE then run it without much trouble on a remote cluster,
accessible only via a SOCKS proxy between the local development machine and the master node
of the cluster (ignoring, for now, any dependencies that would need to be transferred--assume
it's a very simple app with no dependencies that aren't part of the Spark classpath on the
cluster).  This is possible with Hadoop by setting hadoop.rpc.socket.factory.class.default
to and hadoop.socks.server to localhost:<port
on which a SOCKS proxy has been opened via "ssh -D" to the master node>.  However, I can't
seem to find anything like this for Spark, and I only see very few mentions of it on the user
list and on stackoverflow, with no real answers.  (See links below.)

I thought I might be able to use the JVM's -DsocksProxyHost and -DsocksProxyPort system properties,
but it still does not seem to work.  That is, if I start a SOCKS proxy to my master node using
something like "ssh -D 2600 <master node public name>" then run a simple Spark app that
calls SparkConf.setMaster("spark://<master node private IP>:7077"), passing in JVM args
of "-DsocksProxyHost=locahost -DsocksProxyPort=2600", the driver hangs for a while before
finally giving up ("Application has been killed. Reason: All masters are unresponsive! Giving
up.").  It seems like it is not even attempting to use the SOCKS proxy.  Do -DsocksProxyHost/-DsocksProxyPort
not even work for Spark? (unanswered
similar question from somebody else about a month ago) (unresolved, somewhat related JIRA from a
few months ago)


View raw message