spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "yuhang.chenn" <yuhang.ch...@gmail.com>
Subject Re: How does Spark streaming's Kafka direct stream survive from worker node failure?
Date Fri, 26 Feb 2016 17:33:32 GMT
<div dir="ltr">Thanks a lot.<br>
</div><div dir="ltr"><br>
</div><div dir="ltr"><font color ="#abb2bb">发自WPS邮箱客戶端</font></div><div
class="wps_quotion">在 Cody Koeninger &lt;cody@koeninger.org&gt;,2016年2月27日
上午1:02写道:<br type='attribution'><blockquote class="quote" style="margin:0
0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p/><div dir="ltr">Yes.</div><div
class="gmail_extra"><br><div class="gmail_quote">On Thu, Feb 25, 2016 at 9:45
PM, yuhang.chenn <span dir="ltr">&lt;<a href="mailto:yuhang.chenn@gmail.com"
target="_blank">yuhang.chenn@gmail.com</a>&gt;</span> wrote:<br><blockquote
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
dir="ltr">Thanks a lot.<br>
</div><div dir="ltr">And I got another question: What would happen if I didn&#39;t
set &quot;spark.streaming.kafka.maxRatePerPartition&quot;? Will Spark Streamning try
to consume all the messages in Kafka?<br>
</div><div dir="ltr"><br>
</div><div dir="ltr"><font color="#abb2bb">发自WPS邮箱客戶端</font></div><div>在
Cody Koeninger &lt;<a href="mailto:cody@koeninger.org" target="_blank">cody@koeninger.org</a>&gt;,2016年2月25日
上午11:58写道:<div><div class="h5"><br type="attribution"><blockquote
style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><p></p><div
dir="ltr"><div>The per partition offsets are part of the rdd as defined on the driver.
Have you read</div><div><br></div><a href="https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md"
target="_blank">https://github.com/koeninger/kafka-exactly-once/blob/master/blogpost.md</a><br><div><br></div><div>and/or
watched</div><div><br></div><div><a href="https://www.youtube.com/watch?v=fXnNEq1v3VA"
target="_blank">https://www.youtube.com/watch?v=fXnNEq1v3VA</a><br></div></div><div
class="gmail_extra"><br><div class="gmail_quote">On Wed, Feb 24, 2016 at 9:05
PM, Yuhang Chen <span dir="ltr">&lt;<a href="mailto:yuhang.chenn@gmail.com" target="_blank">yuhang.chenn@gmail.com</a>&gt;</span>
wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi, as far as I know, there
is a 1:1 mapping between Spark partition and Kafka partition, and in Spark&#39;s fault-tolerance mechanism,
if a partition failed, another partition will be used to recompute those data. And my questions
are below:</div><div><br></div><div>When a partition (worker
node) fails in Spark Streaming,</div><div>1. Is its computation passed to another
partition, or just waits for the failed partition to restart? <br></div><div>2.
How does the restarted partition know the offset range it should consume from Kafka? It should
consume the some data as the before-failed one, right?</div></div>
</blockquote></div><br></div>
<p></p></blockquote></div></div></div></blockquote></div><br></div>
</blockquote></div>
Mime
View raw message