spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Omernik <j...@omernik.com>
Subject Re: Spark on Mesos: Multiple Users with iPython Notebooks
Date Fri, 20 Feb 2015 13:24:34 GMT
Awesome! This is exactly what I'd need.  Unfortunately, I am not a
programmer of any talent or skill, but how could I assist with this
JIRA? From a User perspective, this is really the next step for my org
taking our Mesos cluster to user land with Spark. I don't want to be
pushy, but is there any sort of time frame I could possibly
communicate to my team? Anything I can do?

Thanks!

On Fri, Feb 20, 2015 at 4:36 AM, Iulian DragoČ™
<iulian.dragos@typesafe.com> wrote:
>
>
> On Thu, Feb 19, 2015 at 2:49 PM, John Omernik <john@omernik.com> wrote:
>>
>> I am running Spark on Mesos and it works quite well.  I have three
>> users, all who setup iPython notebooks to instantiate a spark instance
>> to work with on the notebooks. I love it so far.
>>
>> Since I am "auto" instantiating (I don't want a user to have to
>> "think" about instantiating and submitting a spark app to do adhoc
>> analysis, I want the environment setup ahead of time) this is done
>> whenever an iPython notebook is open.  So far it's working pretty
>> good, save one issue:
>>
>> Every notebook is a new driver. I.e. every time they open a notebook,
>> a new spark submit is called, and the driver resources are allocated,
>> regardless if they are used or not.  Yes, it's only the driver, but
>> even that I find starts slowing down my queries for the notebooks that
>> using spark.  (I am running in Mesos Fined Grained mode).
>>
>>
>> I have three users on my system, ideally, I would love to find a way
>> so that on the first notebook being opened, a driver is started for
>> that user, and then can be used for any notebook the user has open. So
>> if they open a new notebook, I can check that yes, the user has a
>> spark driver running, and thus, that notebook, if there is a query,
>> will run it through that driver. That allows me to understand the
>> resource allocation better, and it limits users from running 10
>> notebooks and having a lot of resources.
>>
>> The other thing I was wondering is could the driver actually be run on
>> the mesos cluster? Right now, I have a "edge" node as an iPython
>> server, the drivers all exist on that server, so as I get more and
>> more drivers, the box's local resources get depleted with unused
>> drivers.  Obviously if I could reuse the drivers per user, on that
>> box, that is great first step, but if I could reuse drivers, and run
>> them on the cluster, that would be ideal.  looking through the docs I
>> was not clear on those options. If anyone could point me in the right
>> direction, I would greatly appreciate it!
>
>
> Cluster mode support for Spark is tracked under
> [SPARK-5338](https://issues.apache.org/jira/browse/SPARK-5338). I know Tim
> Chen is working on it, so there will be progress soon.
>
> iulian
>
>>
>>
>> John
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>
>
>
> --
>
> --
> Iulian Dragos
>
> ------
> Reactive Apps on the JVM
> www.typesafe.com
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message