From user-return-59688-apmail-spark-user-archive=spark.apache.org@spark.apache.org Tue Jul 19 22:23:42 2016 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C5C971950D for ; Tue, 19 Jul 2016 22:23:41 +0000 (UTC) Received: (qmail 66273 invoked by uid 500); 19 Jul 2016 22:23:35 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 64694 invoked by uid 500); 19 Jul 2016 22:23:34 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 64676 invoked by uid 99); 19 Jul 2016 22:23:34 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Jul 2016 22:23:34 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 3C6681A81CD; Tue, 19 Jul 2016 22:23:34 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.473 X-Spam-Level: X-Spam-Status: No, score=0.473 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RP_MATCHES_RCVD=-1.426, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=markmonitor.com Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id LpEVSO3yafZ0; Tue, 19 Jul 2016 22:23:31 +0000 (UTC) Received: from mail1.markmonitor.com (mail1.markmonitor.com [209.66.70.11]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with ESMTPS id 996D15F20E; Tue, 19 Jul 2016 22:23:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by mail1.markmonitor.com (Postfix) with ESMTP id 9CAC863A; Tue, 19 Jul 2016 18:23:22 -0400 (EDT) X-Virus-Scanned: amavisd-new at markmonitor.com Received: from mail1.markmonitor.com ([127.0.0.1]) by localhost (mail1.markmonitor.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id Xxj7_8fHDvEG; Tue, 19 Jul 2016 18:23:22 -0400 (EDT) Received: from BDC-EXCAS2.mm-ads.com (unknown [10.110.0.108]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail1.markmonitor.com (Postfix) with ESMTPS id 2E09F3C9; Tue, 19 Jul 2016 18:23:22 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=markmonitor.com; s=mail; t=1468967002; bh=Y0ttf0wdmBTQ1aNsiHTXNlDzWXQfTMfUx3rMSSK4fS4=; h=From:To:Subject:Date; b=CVEF7jsn6VZLiZDUdCrYruvWkXGkNKYqrjHrKfVQYF7qGUecxdYcf2gd/4YdaEVjK a4npgLJWhyspLpijeguAYqysgsTp+7AKn74K8nOOlrFRmkNTr4O/zl7rUvt/2EP9Jd QMn7ACkoqzcseACHuVZ6XjKeXdeCD0SGcXGomIoc= Received: from BDC-EXMBX2.mm-ads.com ([fe80::43c:218d:6d58:45e]) by BDC-EXCAS2.mm-ads.com ([::1]) with mapi id 14.03.0294.000; Tue, 19 Jul 2016 16:23:21 -0600 From: Rachana Srivastava To: "'user@spark.apache.org'" , "'dev@spark.apache.org'" Subject: Missing Exector Logs From Yarn After Spark Failure Thread-Topic: Missing Exector Logs From Yarn After Spark Failure Thread-Index: AdHiDCbT1xjHJ6fNQ1KjhjbgUODgEw== Date: Tue, 19 Jul 2016 22:23:20 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.223.105.199] Content-Type: multipart/alternative; boundary="_000_B3EC8CA3BC7C0440BBCDDC51EB3E81539E4B27BDCEXMBX2mmadscom_" MIME-Version: 1.0 --_000_B3EC8CA3BC7C0440BBCDDC51EB3E81539E4B27BDCEXMBX2mmadscom_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable I am trying to find the root cause of recent Spark application failure in p= roduction. When the Spark application is running I can check NodeManager's = yarn.nodemanager.log-dir property to get the Spark executor container logs. The container has logs for both the running Spark applications Here is the view of the container logs: drwx--x--- 3 yarn yarn 51 Jul 19 09= :04 application_1467068598418_0209 drwx--x--- 5 yarn yarn 141 Jul 19 09:04 = application_1467068598418_0210 But when the application is killed both the application logs are automatica= lly deleted. I have set all the log retention setting etc in Yarn to a very= large number. But still these logs are deleted as soon as the Spark applic= ations are crashed. Question: How can we retain these Spark application logs in Yarn for debugg= ing when the Spark application is crashed for some reason. --_000_B3EC8CA3BC7C0440BBCDDC51EB3E81539E4B27BDCEXMBX2mmadscom_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

I am trying to find the root cause of recent Spark a= pplication failure in production. When the Spark application is running I c= an check NodeManager's yarn.nodemanager.log-dir property to get the Spark e= xecutor container logs.

The container has logs for both the running Spark applications

Here is the view of the container logs: drwx--x--- 3 yarn yarn 51 Jul 19 09= :04 application_1467068598418_0209 drwx--x--- 5 yarn yarn 141 Jul 19 09:04 = application_1467068598418_0210

But when the application is killed both the application logs are automatica= lly deleted. I have set all the log retention setting etc in Yarn to a very= large number. But still these logs are deleted as soon as the Spark applic= ations are crashed.

Question: How can we retain these Spark application logs in Yarn for debugg= ing when the Spark application is crashed for some reason.

--_000_B3EC8CA3BC7C0440BBCDDC51EB3E81539E4B27BDCEXMBX2mmadscom_--