spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: Is FileInputDStream returned by fileStream method a reliable receiver?
Date Wed, 04 Mar 2015 10:17:20 GMT
The file stream does not use receiver. May be that was not clear in the
programming guide. I am updating it for 1.3 release right now, I will make
it more clear.
And file stream has full reliability. Read this in the programming guide.
http://spark.apache.org/docs/latest/streaming-programming-guide.html#semantics-with-files-as-input-source

On Wed, Mar 4, 2015 at 2:14 AM, Emre Sevinc <emre.sevinc@gmail.com> wrote:

> Is FileInputDStream returned by fileStream method a reliable receiver?
>
> In the Spark Streaming Guide it says:
>
>       "There can be two kinds of data sources based on their *reliability*.
> Sources (like Kafka and Flume) allow the transferred data to be
> acknowledged. If the system receiving data from these *reliable* sources
> acknowledge the received data correctly, it can be ensured that no data
> gets lost due to any kind of failure. This leads to two kinds of receivers.
>
>    1. *Reliable Receiver* - A *reliable receiver* correctly acknowledges
>    a reliable source that the data has been received and stored in Spark with
>    replication.
>    2. *Unreliable Receiver* - These are receivers for sources that do not
>    support acknowledging. Even for reliable sources, one may implement an
>    unreliable receiver that do not go into the complexity of acknowledging
>    correctly."
>
>
> So I wonder whether the receivers for HDFS (and local file system) are
> reliable, e.g. when I'm using fileStream method to process files in a
> directory locally or on HDFS?
>
>
> --
> Emre Sevinç
>

Mime
View raw message