From dev-return-13025-apmail-spark-dev-archive=spark.apache.org@spark.apache.org Thu May 7 08:25:13 2015 Return-Path: X-Original-To: apmail-spark-dev-archive@minotaur.apache.org Delivered-To: apmail-spark-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D24FF10A60 for ; Thu, 7 May 2015 08:25:13 +0000 (UTC) Received: (qmail 46336 invoked by uid 500); 7 May 2015 08:25:12 -0000 Delivered-To: apmail-spark-dev-archive@spark.apache.org Received: (qmail 46238 invoked by uid 500); 7 May 2015 08:25:12 -0000 Mailing-List: contact dev-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list dev@spark.apache.org Received: (qmail 46226 invoked by uid 99); 7 May 2015 08:25:12 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2015 08:25:12 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A7692C216D for ; Thu, 7 May 2015 08:25:11 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.88 X-Spam-Level: ** X-Spam-Status: No, score=2.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id by5IFrX80mhn for ; Thu, 7 May 2015 08:25:07 +0000 (UTC) Received: from mail-oi0-f51.google.com (mail-oi0-f51.google.com [209.85.218.51]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 03B272A888 for ; Thu, 7 May 2015 08:18:15 +0000 (UTC) Received: by oign205 with SMTP id n205so27600018oig.2 for ; Thu, 07 May 2015 01:16:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:cc:content-type; bh=cmqLMrNDJVLKrzwDkk0qFgpwcXhx3ObtNL8+UeOlHy0=; b=gPOmDUeaqdHF0QzKZDzewjc1LI/0eV3VNmzglGBMFXm6ZbqNHU2nZ9aQFQ5GxzKen9 sxnzRCA5Z7B9hiMMFBvWV6GkBQbJamZotGj9rGmoXPqgZ+9i+GUiPkK/RW9hkPKJo20N D7KNGCFubBGWwUYPLCN7SZho6txulHXxOffcXxPH9MEryCDiJ4DC7Rh8K5+L15Ge4zUM /HhmeFK1ONfmXagxK9NeCYLJrJxZz0m19pPb9Zcl9MAZ9r+WUaoerhqARGJ5sgqICc6s Nj2Q+WWfZoRvDt+up5fjRFN/DO0Tnws4fKlGODjkdKvlAqdiJlcWNJb9i2NXk/0OhfYU vm/w== MIME-Version: 1.0 X-Received: by 10.60.40.163 with SMTP id y3mr2287546oek.34.1430986604372; Thu, 07 May 2015 01:16:44 -0700 (PDT) Received: by 10.76.155.164 with HTTP; Thu, 7 May 2015 01:16:44 -0700 (PDT) Date: Thu, 7 May 2015 13:46:44 +0530 Message-ID: Subject: Spark Streaming with Tachyon : Some findings From: Dibyendu Bhattacharya To: "dev@spark.apache.org" Cc: Tathagata Das , aaron@databricks.com, haoyuan@tachyonnexus.com Content-Type: multipart/alternative; boundary=089e015369763b697a051579888d --089e015369763b697a051579888d Content-Type: text/plain; charset=UTF-8 Dear All , I have been playing with Spark Streaming on Tachyon as the OFF_HEAP block store . Primary reason for evaluating Tachyon is to find if Tachyon can solve the Spark BlockNotFoundException . In traditional MEMORY_ONLY StorageLevel, when blocks are evicted , jobs failed due to block not found exception and storing blocks in MEMORY_AND_DISK is not a good option either as it impact the throughput a lot . To test how Tachyon behave , I took the latest spark 1.4 from master , and used Tachyon 0.6.4 and configured Tachyon in Fault Tolerant Mode . Tachyon is running in 3 Node AWS x-large cluster and Spark is running in 3 node AWS x-large cluster. I have used the low level Receiver based Kafka consumer ( https://github.com/dibbhatt/kafka-spark-consumer) which I have written to pull from Kafka and write Blocks to Tachyon I found there is similar improvement in throughput (as MEMORY_ONLY case ) but very good overall memory utilization (as it is off heap store) . But I found one issue on which I need to clarification . In Tachyon case also , I find BlockNotFoundException , but due to a different reason . What I see TachyonBlockManager.scala put the blocks in WriteType.TRY_CACHE configuration . And because of this Blocks ate evicted from Tachyon Cache and when Spark try to find the block it throws BlockNotFoundException . I see a pull request which discuss the same .. https://github.com/apache/spark/pull/158#discussion_r11195271 When I modified the WriteType to CACHE_THROUGH , BlockDropException is gone , but it again impact the throughput .. Just curious to know , if Tachyon has any settings which can solve the Block Eviction from Cache to Disk, other than explicitly setting CACHE_THROUGH ? Regards, Dibyendu --089e015369763b697a051579888d--