From user-return-73755-apmail-spark-user-archive=spark.apache.org@spark.apache.org Fri Feb 23 10:46:38 2018 Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9BDCC18DAE for ; Fri, 23 Feb 2018 10:46:38 +0000 (UTC) Received: (qmail 70219 invoked by uid 500); 23 Feb 2018 10:46:30 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 70071 invoked by uid 500); 23 Feb 2018 10:46:30 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 70061 invoked by uid 99); 23 Feb 2018 10:46:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Feb 2018 10:46:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id DF697180355 for ; Fri, 23 Feb 2018 10:46:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.429 X-Spam-Level: * X-Spam-Status: No, score=1.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id AhqZ9uWkYt9F for ; Fri, 23 Feb 2018 10:46:27 +0000 (UTC) Received: from mail-qt0-f182.google.com (mail-qt0-f182.google.com [209.85.216.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 3CA7D5F216 for ; Fri, 23 Feb 2018 10:46:27 +0000 (UTC) Received: by mail-qt0-f182.google.com with SMTP id m13so5290416qtg.13 for ; Fri, 23 Feb 2018 02:46:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=GDcEGrnD4lIwqG8ptkLao8RjJj2d0LfbhpDlL4NVpLg=; b=juC6Wjygb7c7MGsizH892Ji9QcAp5PXxzgzHv/I3i2YfW+GZFB3Kir8Z16eXq8FIA1 6iNghFUgP0QNiOLkc7FZVmTZbZ77XHJPxynhDLY6wkx80oFZGlpdpt17rdoG1Q8ZIyAw tdb/WBmcvSEX8mf6E1xQGUR7maoHc/0711tDFpfiCVTvQuSsMwhtbSQZIDnPf+T2Uv+x WSVIO98KdinuReSZ3szX3jjYOTFkDpM5uIyv6ZMXiu4IFoBPNGtY3Z1xCFje3ekumntp /ZY1UqIgPraMI/f13RWwQiYtLCRxpfQ9mtkBBFEbe2D+FwFaGF/c7KDUcj6PiNfC3NZk LzFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=GDcEGrnD4lIwqG8ptkLao8RjJj2d0LfbhpDlL4NVpLg=; b=R+cwVFQyVv9ny9cInmTbsnRXEs8d8OT1v3A8Vql0moycSDIEbE3uGFuIxLrkNA6J9d DqhjulIIkDOA7KFSHcfa6cDPVkCyXuK6fKHLv/6MW3mBvrS7tKk4+ACYtkbYOSgIz74h ZH6igu1qsWdo4X+vqRA/mOhjFtIrwoS5yBnUCFDIhw+iczNxofeni6a+qxRpf10MwkE9 pUJHN9wX4rnVZYpnNtLv0hbFYOtxLVHeSFYxU4TXz19djkAxwyXsmQboh0aL9PyrEM3Z 5bFBTbwqLpIuN3v9tilDU8VJ5M7Z0cgINgCIyIUf2YtGdhXrZ173q3IPYu9JOycglQRm 4gVg== X-Gm-Message-State: APf1xPBFkrOrPqsy0xVwQUZWiL4pg2yJymeduie99Gr+vtnsISXrj6iy Z2S3KRVQK+ke1zfV0GeSN4RNVav2/AIKqIm5cxnksA== X-Google-Smtp-Source: AG47ELsUK3mEixiLRMlgSoBpPAsLE5L+ZQhhqtREGSa+Q9ydxXpsEsAD69NIUjNMVQwZe93mgRtNhxwMRyz+XIFueIA= X-Received: by 10.200.39.178 with SMTP id w47mr1927421qtw.206.1519382786558; Fri, 23 Feb 2018 02:46:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.12.152.3 with HTTP; Fri, 23 Feb 2018 02:45:46 -0800 (PST) From: kant kodali Date: Fri, 23 Feb 2018 02:45:46 -0800 Message-ID: Subject: What happens if I can't fit data into memory while doing stream-stream join. To: "user @spark" Content-Type: multipart/alternative; boundary="001a11402d2e454f970565dedf5d" --001a11402d2e454f970565dedf5d Content-Type: text/plain; charset="UTF-8" Hi All, I am experimenting with Spark 2.3.0 stream-stream join feature to see if I can leverage it to replace some of our existing services. Imagine I have 3 worker nodes with *each node* having (16GB RAM and 100GB SSD). My input dataset which is in Kafka is about 250GB per day. Now I want to do a stream-stream join across 8 data frames with a watermark set to 24 hours of Tumbling window. so, I need to hold state for 24 hours and then I can clear all the data. Questions: 1) What happens if I can't fit data into memory while doing stream-stream join? 2) What Storage Level should I choose here for near optimal performance? 3) Any other suggestions? Thanks! --001a11402d2e454f970565dedf5d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi All,

I am experimenting with Spark 2= .3.0 stream-stream join feature to see if I can leverage it to replace some= of our existing services.=C2=A0

Imagine I have 3 = worker nodes with each node having (16GB RAM and 100GB SSD). My inpu= t dataset which is in Kafka is about 250GB per day. Now I want to do a stre= am-stream join across 8 data frames with a watermark set to 24 hours of Tum= bling window. so, I need to hold state for 24 hours and then I can clear al= l the data.

Questions:

1)=C2=A0What happens if I can't fi= t data into memory while doing stream-stream join?
2) What Storage Level= should I choose here for near optimal performance?
3) Any other suggest= ions?

Thanks!
--001a11402d2e454f970565dedf5d--