spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "guxiaobo1982" <guxiaobo1...@qq.com>
Subject Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver in standalone cluster mode?
Date Thu, 02 Jan 2014 04:50:30 GMT
0.8.1 of Spark is released now , do you mean we can share cached RDDs using this version now?
  

 

 ------------------ Original ------------------
  From:  "Sriram Ramachandrasekaran"<sri.rams85@gmail.com>;
 Date:  Jan 2, 2014
 To:  "user"<user@spark.incubator.apache.org>; 
 
 Subject:  Re: Reply: Reply: Any best practice for hardware configuration forthemasterserver
in standalone cluster mode?

 

 Yes the driver would run on the machine from which you launch your spark job. As for sharing
cached RDDs, I don't think it's possible up until 0.8.1. The RDDs are not available across
spark contexts, if my understanding is right.  

 If you still want to share RDDs, then you might have write a single service that maintains
the cached RDD and the various other apps that want to access that RDD talk to that service.
If I understand right, Shark handles SQL queries like this.

 

 On Tue, Dec 31, 2013 at 7:46 PM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
  We have different developers sharing a Spark cluster, and we don't let developers touch
the master server. Each of the developers will commit their application from their desktop,
then does each driver run on their desktops?
 
 Buy the way, can developers share cached RDDs.
 
 
  

 

 ------------------ Original ------------------
  Sender: "Mayur Rustagi"<mayur.rustagi@gmail.com>;
 Send time: Tuesday, Dec 31, 2013 10:11 PM
 To: "user"<user@spark.incubator.apache.org>; 
 
 Subject: Re: Reply: Any best practice for hardware configuration for themasterserver in standalone
cluster mode?

 

 Driver is the process that manages the execution across the cluster. So say your application
is a "sql query" then the system spawns a shark-cli-driver that uses spark framework, hdfs
etc to execute the query and deliver result. All this happens automatically so you dont need
to worry about it as a user of spark/shark framework. Just go for a bigger machine with a
master.  


 
  Mayur Rustagi
Ph: +919632149971  http://www.sigmoidanalytics.com
  https://twitter.com/mayur_rustagi
 






 On Tue, Dec 31, 2013 at 7:01 PM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
  Thanks for your reply, I am new hand at Spark, does driver mean the server where user applications
are commit?
 

  

 

 ------------------ Original ------------------
  Sender: "Mayur Rustagi"<mayur.rustagi@gmail.com>;
 Send time: Tuesday, Dec 31, 2013 9:55 PM
 To: "user"<user@spark.incubator.apache.org>; 
 
 Subject: Re: Any best practice for hardware configuration for the masterserver in standalone
cluster mode?

 

 Master server needs to be a little beefy as the driver runs on it. We ran into some issues
around scaling due to master servers. You can offload the drivers to workers or other machines
then the master server can be smaller.  Regards
Mayur

 
  Mayur Rustagi
Ph: +919632149971  http://www.sigmoidanalytics.com
  https://twitter.com/mayur_rustagi
 






 On Tue, Dec 31, 2013 at 6:48 PM, guxiaobo1982 <guxiaobo1982@qq.com> wrote:
  Him
 

 I read the following article regarding to hardware configurations for the worker servers
in the standalone cluster mode, but what about the master server?
 

 http://spark.incubator.apache.org/docs/latest/hardware-provisioning.html
 

 

 Regards,
 

 Xiaobo Gu
 








 




 

-- 
It's just about how deep your longing is!
Mime
View raw message