From user-return-21861-apmail-spark-user-archive=spark.apache.org@spark.apache.org Fri Dec 5 19:16:50 2014 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 60A8B1092D for ; Fri, 5 Dec 2014 19:16:50 +0000 (UTC) Received: (qmail 49072 invoked by uid 500); 5 Dec 2014 19:16:48 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 49004 invoked by uid 500); 5 Dec 2014 19:16:48 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 48993 invoked by uid 99); 5 Dec 2014 19:16:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2014 19:16:48 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.215.47] (HELO mail-la0-f47.google.com) (209.85.215.47) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Dec 2014 19:16:43 +0000 Received: by mail-la0-f47.google.com with SMTP id hz20so1183977lab.6 for ; Fri, 05 Dec 2014 11:15:37 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=afRBSE2JC6ablGC4iitE9/wECjhq5nf4R/Nysrozgc0=; b=gfc2xCJz+Kyt0nNJnnFaKULh1u0gAP/q1X1UYZvnOFj92VEC+SKHrR/sT0UHXyvoOi hOrLwtL/rm4juhdErlO5DbS49ndN9DB4hEvfBdCb8gbkbidYcqB10KJsfeDqQ5SwFZbc UOSHzJdYJ93bUOT4KT3OSycfmdc5gashIU2dPzTUQIVDJBX+Q8gCs57fguAfAceE8mzp qNHs6Gdws+2p0RxgdWorxSSkKH1/iHhHcJr50kkq10ZmIVjpo1SLdl6aHDdbnsBRfG19 GdlcvVXcpMhT1x7/zVZPbxkxxsC5Kv79ybMuR52JrKdCh4tUPcgqghBJeO0hNMg767rz caVQ== X-Gm-Message-State: ALoCoQmBkZx6gdZAYMZu1v28eUXx1PM/1689LQ6YUUkTFv25Y5juScLyexEwEHpfwHstvC9cHXTG MIME-Version: 1.0 X-Received: by 10.112.189.104 with SMTP id gh8mr4384589lbc.91.1417806937275; Fri, 05 Dec 2014 11:15:37 -0800 (PST) Received: by 10.112.112.66 with HTTP; Fri, 5 Dec 2014 11:15:37 -0800 (PST) In-Reply-To: References: Date: Fri, 5 Dec 2014 11:15:37 -0800 Message-ID: Subject: Re: spark-submit on YARN is slow From: Andrew Or To: Tobias Pfeiffer Cc: user Content-Type: multipart/alternative; boundary=001a11c36e2edb714705097ce674 X-Virus-Checked: Checked by ClamAV on apache.org --001a11c36e2edb714705097ce674 Content-Type: text/plain; charset=UTF-8 Hey Tobias, As you suspect, the reason why it's slow is because the resource manager in YARN takes a while to grant resources. This is because YARN needs to first set up the application master container, and then this AM needs to request more containers for Spark executors. I think this accounts for most of the overhead. The remaining source probably comes from how our own YARN integration code polls application (every second) and cluster resource states (every 5 seconds IIRC). I haven't explored in detail whether there are optimizations there that can speed this up, but I believe most of the overhead comes from YARN itself. In other words, no I don't know of any quick fix on your end that you can do to speed this up. -Andrew 2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer : > Hi, > > I am using spark-submit to submit my application to YARN in "yarn-cluster" > mode. I have both the Spark assembly jar file as well as my application jar > file put in HDFS and can see from the logging output that both files are > used from there. However, it still takes about 10 seconds for my > application's yarnAppState to switch from ACCEPTED to RUNNING. > > I am aware that this is probably not a Spark issue, but some YARN > configuration setting (or YARN-inherent slowness), I was just wondering if > anyone has an advice for how to speed this up. > > Thanks > Tobias > --001a11c36e2edb714705097ce674 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hey Tobias,

As you suspect, the reason = why it's slow is because the resource manager in YARN takes a while to = grant resources. This is because YARN needs to first set up the application= master container, and then this AM needs to request more containers for Sp= ark executors. I think this accounts for most of the overhead. The remainin= g source probably comes from how our own YARN integration code polls applic= ation (every second) and cluster resource states (every 5 seconds IIRC). I = haven't explored in detail whether there are optimizations there that c= an speed this up, but I believe most of the overhead comes from YARN itself= .

In other words, no I don't know of any quick= fix on your end that you can do to speed this up.

-Andrew


2014-12-03 20:10 GMT-08:00 Tobias Pfeiffer <tgp@preferred= .jp>:
Hi,<= div>
I am using spark-submit to submit my application to YARN= in "yarn-cluster" mode. I have both the Spark assembly jar file = as well as my application jar file put in HDFS and can see from the logging= output that both files are used from there. However, it still takes about = 10 seconds for my application's yarnAppState to switch from ACCEPTED to= RUNNING.

I am aware that this is probably not a S= park issue, but some YARN configuration setting (or YARN-inherent slowness)= , I was just wondering if anyone has an advice for how to speed this up.

Thanks
Tobias

--001a11c36e2edb714705097ce674--