spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <>
Subject Re: Access multiple cluster
Date Mon, 05 Dec 2016 11:24:44 GMT
if the remote filesystem is visible from the other, than a different HDFS value, e.g hdfs://analytics:8000/historical/
 can be used for reads & writes, even if your defaultFS (the one where you get max performance)
is, say hdfs://processing:8000/

-performance will be slower, in both directions
-if you have a fast pipe between the two clusters, then a job with many executors may unintentionally
saturate the network, leading to unhappy people elsewhere.
-you'd better have mutual trust at the kerberos layer. There's a configuration option (I forget
its name) to give spark-submit a list of hdfs namenodes it will need to get tokens from. Unless
your spark cluster is being launched with keytabs, you will need to list upfront all hdfs
clusters your job intends to work with

On 4 Dec 2016, at 21:45, ayan guha <<>>


Is it possible to access hive tables sitting on multiple clusters in a single spark application?

We have a data processing cluster and analytics cluster. I want to join a table from analytics
cluster with another table in processing cluster and finally write back in analytics cluster.


View raw message