spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "bit1129@163.com" <bit1...@163.com>
Subject Re: Re: How does Spark honor data locality when allocating computing resources for an application
Date Mon, 16 Mar 2015 06:41:14 GMT
Thanks Eric. I revisited the code, and find  the spreadOutApps option is enabled by default
with following code:val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", true).
Which I misread it as  val spreadOutApps = conf.getBoolean("spark.deploy.spreadOut", false).
Thanks.



bit1129@163.com
 
From: eric wong
Date: 2015-03-14 22:36
To: bit1129@163.com; user
Subject: Re: How does Spark honor data locality when allocating computing resources for an
application
you seem like not to note the configuration varible "spreadOutApps"

And it's comment: 
  // As a temporary workaround before better ways of configuring memory, we allow users to
set
  // a flag that will perform round-robin scheduling across the nodes (spreading out each
app
  // among all the nodes) instead of trying to consolidate each app onto a small # of nodes.

2015-03-14 10:41 GMT+08:00 bit1129@163.com <bit1129@163.com>:
Hi, sparkers,
When I read the code about computing resources allocation for the newly submitted application
in the Master#schedule method,  I got a question about data locality:

// Pack each app into as few nodes as possible until we've assigned all its cores 
for (worker <- workers if worker.coresFree > 0 && worker.state == WorkerState.ALIVE)
{ 
   for (app <- waitingApps if app.coresLeft > 0) { 
      if (canUse(app, worker)) { 
          val coresToUse = math.min(worker.coresFree, app.coresLeft) 
         if (coresToUse > 0) { 
                val exec = app.addExecutor(worker, coresToUse) 
                launchExecutor(worker, exec) 
                app.state = ApplicationState.RUNNING 
         } 
     } 
  } 
}

Looks that the resource allocation policy here is that Master will assign as few workers as
possible, so long as these few workers has enough resources for the application.
My question is: Assume that the data the application will process is spread on all the worker
nodes, then the data locality is lost if using the above policy?
Not sure whether I have unstandood correctly or I have missed something.




bit1129@163.com



-- 
王海华
Mime
View raw message