spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renyi Xiong <renyixio...@gmail.com>
Subject Re: failure notice
Date Tue, 06 Oct 2015 15:27:35 GMT
yes, it can recover on a different node. it uses write-ahead-log,
checkpoints offsets of both ingress and egress (e.g. using zookeeper and/or
kafka), replies on the streaming engine's deterministic operations.

by replaying back a certain range of data based on checkpointed
ingress offset (at least once semantic), state can be recovered, and
filters out duplicate events based on checkpointed egress offset (at most
once semantic)

hope it makes sense.

On Mon, Oct 5, 2015 at 3:11 PM, Tathagata Das <tdas@databricks.com> wrote:

> What happens when a whole node running  your " per node streaming engine
> (built-in checkpoint and recovery)" fails? Can its checkpoint and recovery
> mechanism handle whole node failure? Can you recover from the checkpoint on
> a different node?
>
> Spark and Spark Streaming were designed with the idea that executors are
> disposable, and there should not be any node-specific long term state that
> you rely on unless you can recover that state on a different node.
>
> On Mon, Oct 5, 2015 at 3:03 PM, Renyi Xiong <renyixiong0@gmail.com> wrote:
>
>> if RDDs from same DStream not guaranteed to run on same worker, then the
>> question becomes:
>>
>> is it possible to specify an unlimited duration in ssc to have a
>> continuous stream (as opposed to discretized).
>>
>> say, we have a per node streaming engine (built-in checkpoint and
>> recovery) we'd like to integrate with spark streaming. can we have a
>> never-ending batch (or RDD) this way?
>>
>> On Mon, Sep 28, 2015 at 4:31 PM, <MAILER-DAEMON@apache.org> wrote:
>>
>>> Hi. This is the qmail-send program at apache.org.
>>> I'm afraid I wasn't able to deliver your message to the following
>>> addresses.
>>> This is a permanent error; I've given up. Sorry it didn't work out.
>>>
>>> <user@spark.apache.org>:
>>> Must be sent from an @apache.org address or a subscriber address or an
>>> address in LDAP.
>>>
>>> --- Below this line is a copy of the message.
>>>
>>> Return-Path: <renyixiong0@gmail.com>
>>> Received: (qmail 95559 invoked by uid 99); 28 Sep 2015 23:31:46 -0000
>>> Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142)
>>>     by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2015 23:31:46
>>> +0000
>>> Received: from localhost (localhost [127.0.0.1])
>>>         by spamd3-us-west.apache.org (ASF Mail Server at
>>> spamd3-us-west.apache.org) with ESMTP id 96E361809BA
>>>         for <user@spark.apache.org>; Mon, 28 Sep 2015 23:31:45 +0000
>>> (UTC)
>>> X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org
>>> X-Spam-Flag: NO
>>> X-Spam-Score: 3.129
>>> X-Spam-Level: ***
>>> X-Spam-Status: No, score=3.129 tagged_above=-999 required=6.31
>>>         tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1,
>>>         FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=3,
>>>         RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01,
>>> SPF_PASS=-0.001]
>>>         autolearn=disabled
>>> Authentication-Results: spamd3-us-west.apache.org (amavisd-new);
>>>         dkim=pass (2048-bit key) header.d=gmail.com
>>> Received: from mx1-us-west.apache.org ([10.40.0.8])
>>>         by localhost (spamd3-us-west.apache.org [10.40.0.10])
>>> (amavisd-new, port 10024)
>>>         with ESMTP id FAGoohFE7Y7A for <user@spark.apache.org>;
>>>         Mon, 28 Sep 2015 23:31:44 +0000 (UTC)
>>> Received: from mail-la0-f51.google.com (mail-la0-f51.google.com
>>> [209.85.215.51])
>>>         by mx1-us-west.apache.org (ASF Mail Server at
>>> mx1-us-west.apache.org) with ESMTPS id 2ED40204C9
>>>         for <user@spark.apache.org>; Mon, 28 Sep 2015 23:31:44 +0000
>>> (UTC)
>>> Received: by labzv5 with SMTP id zv5so32919088lab.1
>>>         for <user@spark.apache.org>; Mon, 28 Sep 2015 16:31:42 -0700
>>> (PDT)
>>> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
>>>         d=gmail.com; s=20120113;
>>>
>>> h=mime-version:in-reply-to:references:date:message-id:subject:from:to
>>>          :cc:content-type;
>>>         bh=F36l+I4dfDHTL7nQ0K9mAW4aVtPpVpYc0rWWpPNjt4c=;
>>>
>>> b=QfRdLEWf4clJqwkZSH7n0oHjXLNifWdhYxvCDZ+P37oSfM0vd/8Bx962hTflRQkD1q
>>>
>>>  2B3go7g8bpnQlhZgMRrZfT6hk7vUtNA3lOZjYeN+cPyoVRaBwm3LIID5vF4cw5hFAWaM
>>>
>>>  LUenU7E7b9kJY8JkyhIfpya8CLKz+Yo6EjCv3W6BAvv2YiNPgbOQkpx7u8y9dw0kHGig
>>>
>>>  1hv37Ey/DZpoKCgbSesv+sztYslevu+VBgxHFkveEyxH1saRr6OqTM7fnL2E6fP4E8qu
>>>
>>>  W81G1ZfNW1Pp9i5IcCb/9S7YTZDnBlUj4yROsOfNANRGmed71QpQD9l8NnAQXmeqoeNF
>>>          SyEg==
>>> MIME-Version: 1.0
>>> X-Received: by 10.25.213.75 with SMTP id
>>> m72mr4047578lfg.17.1443483102618;
>>>  Mon, 28 Sep 2015 16:31:42 -0700 (PDT)
>>> Received: by 10.25.207.18 with HTTP; Mon, 28 Sep 2015 16:31:42 -0700
>>> (PDT)
>>> In-Reply-To: <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz=
>>> 2LPA@mail.gmail.com>
>>> References: <
>>> CANgSV6-k+33GvgtiyNwhz2Gsbudf_WwwnazVUpbqe8QdCg_k3w@mail.gmail.com>
>>>         <CAPn6-YQ3Q-=HMrqz5FLLPx_HmjmHkHP7cwsPYvsxw-tb7a8P=
>>> g@mail.gmail.com>
>>>         <CAPn6-YTZo+-A1HThmBO4mKO0sELrvxP7DZEpg3GoWN=Qz=
>>> 2LPA@mail.gmail.com>
>>> Date: Mon, 28 Sep 2015 16:31:42 -0700
>>> Message-ID: <
>>> CANgSV69hyqBbVb8_8zShsTLrpDy-37fjNwYVXCe-XF7DphQ8oA@mail.gmail.com>
>>> Subject: Re: Spark streaming DStream state on worker
>>> From: Renyi Xiong <renyixiong0@gmail.com>
>>> To: Shixiong Zhu <zsxwing@gmail.com>
>>> Cc: "user@spark.apache.org" <user@spark.apache.org>
>>> Content-Type: multipart/alternative;
>>> boundary=001a11411fde922c170520d71928
>>>
>>> --001a11411fde922c170520d71928
>>> Content-Type: text/plain; charset=UTF-8
>>>
>>> you answered my question I think that RDDs from same DStream not
>>> guaranteed
>>> to run on same worker
>>>
>>> On Thu, Sep 24, 2015 at 1:51 AM, Shixiong Zhu <zsxwing@gmail.com> wrote:
>>>
>>> > +user, -dev
>>> >
>>> > It's not clear about `compute` in your question. There are two
>>> `compute`
>>> > here.
>>> >
>>> > 1. DStream.compute: it always runs in the driver, and all RDDs are
>>> created
>>> > in the driver. E.g.,
>>> >
>>> > DStream.foreachRDD(rdd => rdd.count())
>>> >
>>> > "rdd.count()" is called in the driver.
>>> >
>>> > 2. RDD.compute: this will run in the executor and the location is not
>>> > guaranteed. E.g.,
>>> >
>>> > DStream.foreachRDD(rdd => rdd.foreach { v =>
>>> >     println(v)
>>> > })
>>> >
>>> > "println(v)" is called in the executor.
>>> >
>>> >
>>> > Best Regards,
>>> > Shixiong Zhu
>>> >
>>> > 2015-09-17 3:47 GMT+08:00 Renyi Xiong <renyixiong0@gmail.com>:
>>> >
>>> >> Hi,
>>> >>
>>> >> I want to do temporal join operation on DStream across RDDs, my
>>> question
>>> >> is: Are RDDs from same DStream always computed on same worker (except
>>> >> failover) ?
>>> >>
>>> >> thanks,
>>> >> Renyi.
>>> >>
>>> >
>>> >
>>> >
>>>
>>> --001a11411fde922c170520d71928
>>> Content-Type: text/html; charset=UTF-8
>>> Content-Transfer-Encoding: quoted-printable
>>>
>>> <div dir=3D"ltr">you answered my question I think that RDDs from same
>>> DStre=
>>> am not guaranteed to run on same worker</div><div
>>> class=3D"gmail_extra"><br=
>>> ><div class=3D"gmail_quote">On Thu, Sep 24, 2015 at 1:51 AM, Shixiong
>>> Zhu <=
>>> span dir=3D"ltr">&lt;<a href=3D"mailto:zsxwing@gmail.com"
>>> target=3D"_blank"=
>>> >zsxwing@gmail.com</a>&gt;</span> wrote:<br><blockquote
>>> class=3D"gmail_quot=
>>> e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc
>>> solid;padding-left:1ex">=
>>> <div dir=3D"ltr"><div class=3D"gmail_quote"><div dir=3D"ltr"><div>+user,
>>> -d=
>>> ev</div><div><div class=3D"h5"><div><br></div><div>It&#39;s
not clear
>>> about=
>>>  `compute` in your question. There are two `compute`
>>> here.</div><div><br></=
>>> div><div>1. DStream.compute: it always runs in the driver, and all RDDs
>>> are=
>>>  created in the driver.
>>> E.g.,=C2=A0</div><div><br></div><div>DStream.foreac=
>>> hRDD(rdd =3D&gt;
>>> rdd.count())</div><div><br></div><div>&quot;rdd.count()&qu=
>>> ot; is called in the driver.</div><div><br></div><div>2.
RDD.compute:
>>> this =
>>> will run in the executor and the location is not guaranteed.
>>> E.g.,</div><di=
>>> v><br></div><div>DStream.foreachRDD(rdd =3D&gt; rdd.foreach
{ v
>>> =3D&gt;</di=
>>> v><div>=C2=A0 =C2=A0
>>> println(v)</div><div>})<br></div><div><br></div><div>&=
>>> quot;println(v)&quot; is called in the
>>> executor.</div><br></div></div></div=
>>> ><div><div class=3D"h5"><div class=3D"gmail_extra"><br
>>> clear=3D"all"><div><=
>>> div><div dir=3D"ltr"><div><div dir=3D"ltr"><div><div
>>> dir=3D"ltr"><div><div =
>>> dir=3D"ltr"><p>Best Regards,</p><div>Shixiong
>>> Zhu</div></div></div></div></=
>>> div></div></div></div></div></div><div><div>
>>> <br><div class=3D"gmail_quote">2015-09-17 3:47 GMT+08:00 Renyi Xiong
>>> <span =
>>> dir=3D"ltr">&lt;<a href=3D"mailto:renyixiong0@gmail.com"
>>> target=3D"_blank">=
>>> renyixiong0@gmail.com</a>&gt;</span>:<br><blockquote
>>> class=3D"gmail_quote" =
>>> style=3D"margin:0 0 0 .8ex;border-left:1px #ccc
>>> solid;padding-left:1ex"><di=
>>> v dir=3D"ltr"><div>Hi,</div><div><br></div><div>I
want to
>>> do=C2=A0temporal =
>>> join operation on DStream across RDDs, my question is: Are RDDs from
>>> same D=
>>> Stream always computed on same worker (except failover)
>>> ?</div><div><br></d=
>>> iv><div>thanks,</div><div>Renyi.</div></div>
>>> </blockquote></div><br></div></div></div>
>>> </div></div></div><br></div>
>>> </blockquote></div><br></div>
>>>
>>> --001a11411fde922c170520d71928--
>>>
>>
>>
>

Mime
View raw message