spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From eric wong <win19...@gmail.com>
Subject Re: How does Spark honor data locality when allocating computing resources for an application
Date Sat, 14 Mar 2015 14:36:18 GMT
you seem like not to note the configuration varible "spreadOutApps"

And it's comment:
  // As a temporary workaround before better ways of configuring memory, we
allow users to set
  // a flag that will perform round-robin scheduling across the nodes
(spreading out each app
  // among all the nodes) instead of trying to consolidate each app onto a
small # of nodes.

2015-03-14 10:41 GMT+08:00 bit1129@163.com <bit1129@163.com>:

> Hi, sparkers,
> When I read the code about computing resources allocation for the newly
> submitted application in the Master#schedule method,  I got a question
> about data locality:
>
> // Pack each app into as few nodes as possible until we've assigned all
> its cores
> for (worker <- workers if worker.coresFree > 0 && worker.state ==
> WorkerState.ALIVE) {
>    for (app <- waitingApps if app.coresLeft > 0) {
>       if (canUse(app, worker)) {
>           val coresToUse = math.min(worker.coresFree, app.coresLeft)
>          if (coresToUse > 0) {
>                 val exec = app.addExecutor(worker, coresToUse)
>                 launchExecutor(worker, exec)
>                 app.state = ApplicationState.RUNNING
>          }
>      }
>   }
> }
>
> Looks that the resource allocation policy here is that Master will assign
> as few workers as possible, so long as these few workers has enough
> resources for the application.
> My question is: Assume that the data the application will process is
> spread on all the worker nodes, then the data locality is lost if using
> the above policy?
> Not sure whether I have unstandood correctly or I have missed something.
>
>
> ------------------------------
> bit1129@163.com
>



-- 
王海华

Mime
View raw message