spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <>
Subject Re: Spark 1.4.0 - Using SparkR on EC2 Instance
Date Fri, 26 Jun 2015 17:21:54 GMT
So you created an EC2 instance with RStudio installed first, then installed Spark under that
same username?  That makes sense, I just want to verify your work flow. 

Thank you again for your willingness to help! 

On Fri, Jun 26, 2015 at 10:13 AM -0700, "Shivaram Venkataraman" <>

I was using RStudio on the master node of the same cluster in the demo. However I had installed
Spark under the user `rstudio` (i.e. /home/rstudio) and that will make the permissions work
correctly. You will need to copy the config files from /root/spark/conf after installing Spark
though and it might need some more manual tweaks.
On Fri, Jun 26, 2015 at 9:59 AM, Mark Stephenson <> wrote:
In your demo video, were you using RStudio to hit a separate EC2 Spark cluster?  I noticed
that it appeared your browser that you were using EC2 at that time, so I was just curious. 
It appears that might be one of the possible workarounds - fire up a separate EC2 instance
with RStudio Server that initializes the spark context against a separate Spark cluster.

On Jun 26, 2015, at 11:46 AM, Shivaram Venkataraman <> wrote:
We don't have a documented way to use RStudio on EC2 right now. We have a ticket open at
to discuss work-arounds and potential solutions for this. 
On Fri, Jun 26, 2015 at 6:27 AM, RedOakMark <> wrote:
Good morning,

I am having a bit of trouble finalizing the installation and usage of the

newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using

RStudio to run on top of it.

Using these instructions (

<>  ) we can fire up an

EC2 instance (which we have been successful doing - we have gotten the

cluster to launch from the command line without an issue).  Then, I

installed RStudio Server on the same EC2 instance (the master) and

successfully logged into it (using the test/test user) through the web


This is where I get stuck - within RStudio, when I try to reference/find the

folder that SparkR was installed, to load the SparkR library and initialize

a SparkContext, I get permissions errors on the folders, or the library

cannot be found because I cannot find the folder in which the library is


Has anyone successfully launched and utilized SparkR 1.4.0 in this way, with

RStudio Server running on top of the master instance?  Are we on the right

track, or should we manually launch a cluster and attempt to connect to it

from another instance running R?

Thank you in advance!



View this message in context:

Sent from the Apache Spark User List mailing list archive at


To unsubscribe, e-mail:

For additional commands, e-mail:
View raw message