spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Melo <andrew.m...@gmail.com>
Subject Re: [DISCUSS] Remove multiple workers on the same host support from Standalone backend
Date Sat, 14 Mar 2020 22:56:32 GMT
Hi Sean

On Fri, Mar 13, 2020 at 6:46 PM Sean Owen <srowen@gmail.com> wrote:

> Do you really need a new cluster per user? and if so, why specify N
> workers > M machines? I am not seeing a need for that. I don't even
> think 2 workers on the same host makes sense, as they are both
> managing the same resources; it only exists for test purposes AFAICT.
>

Sorry, I'm from a completely different field, so I've inherited a
completely different vocabulary. So thanks for bearing with me :)

I think from reading your response, maybe the confusion is that HTCondor is
a completely different resource acquisition model than what industry is
familiar with. Unlike AWS that gives you a whole VM or k8s that gives you a
whole container, condor (and most other batch schedulers) split up a single
bare machine that your job shares with whatever else is on that machine.
You don't get your own machine or even the illusion you have your own
machine (via containerization).

Using these schedulers it's not that you ask for N workers when there's
only M machines, you request N x 8core slots when there are M cores
available, and the scheduler packs them wherever there's free resources.

> What you are trying to do sounds like one cluster, not many. JVMs

> can't be shared across users; JVM = executor. But that's a good thing,
> or else there would be all kinds of collisions.


> What pools are you referring to?


If you're talking about the 2nd half, let's say I'm running two pyspark
notebooks connected to the system above, and batch scheduler gives each of
them 2 cores of slaves. Each notebook will have their own set (which I
called a pool earlier) of slaves, so when you're working in one notebook,
the other notebook of slaves is idle. My comment was about the resources
being idle and the desire to increase utillzation.

Thanks
Andrew

Sean
>
> On Fri, Mar 13, 2020 at 6:33 PM Andrew Melo <andrew.melo@gmail.com> wrote:
> >
> > Hi Xingbo, Sean,
> >
> > On Fri, Mar 13, 2020 at 12:31 PM Xingbo Jiang <jiangxb1987@gmail.com>
> wrote:
> >>
> >> Andrew, could you provide more context of your use case please? Is it
> like you deploy homogeneous containers on hosts with available resources,
> and each container launches one worker? Or you deploy workers directly on
> hosts thus you could have multiple workers from the same application on the
> same host?
> >
> >
> > Sure, I describe a bit more detail about the actual workload below [*],
> but the short version is that our computing resources/infrastructure are
> all built around batch submission into (usually) the HTCondor scheduler,
> and we've got a PoC using pyspark to replace the interactive portion of
> data analysis. To run pyspark on our main resources, we use some scripts
> around the standalone mode to spin up N slaves per-user**, which may or may
> not end up on the same host. I understood Xingbo's original mail to mean
> that wouldn't be allowed in the future, but from Sean's response, it seems
> like I'm incorrect.
> >
> > That being said, our use-case is very bursty, and it would be very good
> if there was a way we could have one pool of JVMs that could be shared
> between N different concurrent users instead of having N different pools of
> JVMs that each serve one person. We're already resource constrained, and
> we're expecting our data rates to increase 10x in 2026, so the less idle
> CPU, the better for us.
> >
> > Andrew
> >
> > * I work for one of the LHC experiments at CERN (https://cms.cern/) and
> there's two main "phases" of our data pipeline: production and analysis.
> The analysis half is currently implemented by having users writing some
> software, splitting the input dataset(s) into N parts and then submitting
> those jobs to the batch system (combining the outputs in a manual
> postprocessing step). In terms of scale, there are currently ~100 users
> running ~900 tasks over ~50k cpus. The use case relevant to this context is
> the terminal analysis phase which involves calculating some additional
> columns, applying calibrations, filtering out the 'interesting' events and
> extracting histograms describing those events. Conceptually, it's an
> iterative process of "extract plots, look at plots, change parameters", but
> running in a batch system means the latency is bad, so it can take a long
> time to converge to the right set of params.
> >
> > ** though we have much smaller, dedicated k8s/mesos/yarn clusters we use
> for prototyping
> >
> >>
> >> Thanks,
> >>
> >> Xingbo
> >>
> >> On Fri, Mar 13, 2020 at 10:23 AM Sean Owen <srowen@gmail.com> wrote:
> >>>
> >>> You have multiple workers in one Spark (standalone) app? this wouldn't
> >>> prevent N apps from each having a worker on a machine.
> >>>
> >>> On Fri, Mar 13, 2020 at 11:51 AM Andrew Melo <andrew.melo@gmail.com>
> wrote:
> >>> >
> >>> > Hello,
> >>> >
> >>> > On Fri, Feb 28, 2020 at 13:21 Xingbo Jiang <jiangxb1987@gmail.com>
> wrote:
> >>> >>
> >>> >> Hi all,
> >>> >>
> >>> >> Based on my experience, there is no scenario that necessarily
> requires deploying multiple Workers on the same node with Standalone
> backend. A worker should book all the resources reserved to Spark on the
> host it is launched, then it can allocate those resources to one or more
> executors launched by this worker. Since each executor runs in a separated
> JVM, we can limit the memory of each executor to avoid long GC pause.
> >>> >>
> >>> >> The remaining concern is the local-cluster mode is implemented
by
> launching multiple workers on the local host, we might need to re-implement
> LocalSparkCluster to launch only one Worker and multiple executors. It
> should be fine because local-cluster mode is only used in running Spark
> unit test cases, thus end users should not be affected by this change.
> >>> >>
> >>> >> Removing multiple workers on the same host support could simplify
> the deploy model of Standalone backend, and also reduce the burden to
> support legacy deploy pattern in the future feature developments. (There is
> an example in https://issues.apache.org/jira/browse/SPARK-27371 , where
> we designed a complex approach to coordinate resource requirements from
> different workers launched on the same host).
> >>> >>
> >>> >> The proposal is to update the document to deprecate the support
of
> system environment `SPARK_WORKER_INSTANCES` in Spark 3.0, and remove the
> support in the next major version (Spark 3.1).
> >>> >>
> >>> >> Please kindly let me know if you have use cases relying on this
> feature.
> >>> >
> >>> >
> >>> > When deploying spark on batch systems (by wrapping the standalone
> deployment in scripts that can be consumed by the batch scheduler), we
> typically end up with >1 worker per host. If I understand correctly, this
> proposal would make our use case unsupported.
> >>> >
> >>> > Thanks,
> >>> > Andrew
> >>> >
> >>> >
> >>> >
> >>> >>
> >>> >> Thanks!
> >>> >>
> >>> >> Xingbo
> >>> >
> >>> > --
> >>> > It's dark in this basement.
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
> >>>
>

Mime
View raw message