spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jayant Shekhar <jay...@cloudera.com>
Subject Re: Using GraphX with Spark Streaming?
Date Mon, 06 Oct 2014 10:02:05 GMT
Arko,

It would be useful to know more details on the use case you are trying to
solve. As Tobias wrote, Spark Streaming works on DStream, which is a
continuous series of RDDs.

Do check out performance tuning :
https://spark.apache.org/docs/latest/streaming-programming-guide.html#performance-tuning
It is important to reduce the processing time of each batch of data.
Ideally you would want data processing to keep up with the data ingestion.

Thanks,
Jayant


On Sun, Oct 5, 2014 at 6:45 PM, Tobias Pfeiffer <tgp@preferred.jp> wrote:

> Arko,
>
> On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee <
> arkoprovomukherjee@gmail.com> wrote:
>>
>> Apologies if this is a stupid question but I am trying to understand
>> why this can or cannot be done. As far as I understand that streaming
>> algorithms need to be different from batch algorithms as the streaming
>> algorithms are generally incremental. Hence the question whether the
>> RDD transformations can be extended to streaming or not.
>>
>
> I don't think that streaming algorithms are "generally incremental" in
> Spark Streaming. In fact, data is collected and every N seconds
> (minutes/...), the data collected during that interval is batch-processed
> as with normal batch operations. In fact, using data previously obtained
> from the stream (in previous intervals) is a bit more complicated than
> plain batch processing. If the graph you want to create only uses data from
> one interval/batch, that should be dead simple. You might want to have a
> look at
> https://spark.apache.org/docs/latest/streaming-programming-guide.html#discretized-streams-dstreams
>
> Tobias
>
>

Mime
View raw message