spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tdas <...@git.apache.org>
Subject [GitHub] incubator-spark pull request: Improved NetworkReceiver in Spark St...
Date Fri, 07 Feb 2014 22:53:49 GMT
GitHub user tdas opened a pull request:

    https://github.com/apache/incubator-spark/pull/559

    Improved NetworkReceiver in Spark Streaming to allow for a more graceful StreamingContext
shutdown

    Current version of StreamingContext.stop() directly killed all the receivers without waiting
for data received to be persisted and processed. This leads to a large window of data loss.
This PR primarily addresses that. It also fixes the NetworkReceiver API to be more stable
while allowing future optimizations. This has been a semi-public API (though not explicitly
mentioned anywhere) and is not heavily used. However, for future stability, this API change
is important.
    
    The detailed changes are as follows
    - Redefined NetworkReceiver public API for future stability. If any one has written their
own NetworkReceiver using the [custom receiver guidelines](http://spark.incubator.apache.org/docs/latest/streaming-custom-receivers.html),
then he/she can migrate very easily. Instead of creating the `blockGenerator` object and using
`blockGenerator += data`  to push received data to Spark, one can now calls `store(data)`.
The core interface of `onStart()` and `onStop()` remains the same.
    - Changed all the receivers to use the new API.
    - Added an additional flag to `StreamingContext.stop()` to specify whether to shutdown
gracefully. Without setting the flag, the system will shutdown immediately (current default
behavior). With the flag set, the system may take time to shutdown to ensure data received
by the system is processed completely.
    - Clearly defined the behavior of `StreamingContext.start()` and `stop()` - calling stop()
twice or calling stop() before start() will give no exception. However, calling start() twice
or start() after stop() throws an exception as they are a strong indicator of incorrect control
flow. 
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/incubator-spark graceful-shutdown

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-spark/pull/559.patch

----
commit eab351d05c0baef1d4b549e1581310087158d78d
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-23T06:17:15Z

    Update Spark Streaming Programming Guide.

commit 036a7d46187ea3f2a0fb8349ef78f10d6c0b43a9
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-23T06:17:42Z

    Merge remote-tracking branch 'apache/master' into docs-update

commit f06b964a51bb3b21cde2ff8bdea7d9785f6ce3a9
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-23T06:49:12Z

    Fixed missing endhighlight tag in the MLlib guide.

commit 6c29524639463f11eec721e4d17a9d7159f2944b
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-24T02:49:39Z

    Added example and figure for window operations, and links to Kafka and Flume API docs.

commit e3dcb46ab83d7071f611d9b5008ba6bc16c9f951
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-27T02:41:12Z

    Fixed docs on StreamingContext.getOrCreate.

commit d5b6196b532b5746e019b959a79ea0cc013a8fc3
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-27T04:15:58Z

    Added spark.streaming.unpersist config and info on StreamingListener interface.

commit 89a81ff25726bf6d26163e0dd938290a79582c0f
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-27T21:08:34Z

    Updated docs based on Patricks PR comments.

commit f338a60ae8069e0a382d2cb170227e5757cc0b7a
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-28T06:42:42Z

    More updates based on Patrick and Harvey's comments.

commit 34a5a6008dac2e107624c7ff0db0824ee5bae45f
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-29T02:02:28Z

    Updated github url to use SPARK_GITHUB_URL variable.

commit 18ff10556570b39d672beeb0a32075215cfcc944
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-01-29T05:49:30Z

    Fixed a lot of broken links.

commit 24165ff00a892809e10b5c39d4d86023260213bd
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-02-06T09:48:28Z

    Added graceful shutdown to Spark Streaming.

commit 20f10c0bac64e2a95a1cf7b44a7f440be0897d9c
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-02-07T22:07:15Z

    Refactored NetworkReceiver to clearly define the public API and separate all the code
to run a receiver into NetworkReceiverHandler.

commit 177b12f8656aeeacf8125b86bb5c8baddf9b6650
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-02-07T22:16:13Z

    Merge remote-tracking branch 'apache/master' into graceful-shutdown
    
    Conflicts:
    	streaming/src/main/scala/org/apache/spark/streaming/receivers/ActorReceiver.scala

commit 1b1cbfcddfa020fea3e6e0ff2a1fa7c1c82f6b8f
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-02-07T22:19:28Z

    Revert accidental change in slaves file.

commit 281fead3a36836dc75083fe269062bd4b7014fd4
Author: Tathagata Das <tathagata.das1565@gmail.com>
Date:   2014-02-07T22:29:56Z

    Minor changes before PR.

----


Mime
View raw message