spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jayant Shekhar <>
Subject Re: Using GraphX with Spark Streaming?
Date Mon, 06 Oct 2014 10:02:05 GMT

It would be useful to know more details on the use case you are trying to
solve. As Tobias wrote, Spark Streaming works on DStream, which is a
continuous series of RDDs.

Do check out performance tuning :
It is important to reduce the processing time of each batch of data.
Ideally you would want data processing to keep up with the data ingestion.


On Sun, Oct 5, 2014 at 6:45 PM, Tobias Pfeiffer <> wrote:

> Arko,
> On Sat, Oct 4, 2014 at 1:40 AM, Arko Provo Mukherjee <
>> wrote:
>> Apologies if this is a stupid question but I am trying to understand
>> why this can or cannot be done. As far as I understand that streaming
>> algorithms need to be different from batch algorithms as the streaming
>> algorithms are generally incremental. Hence the question whether the
>> RDD transformations can be extended to streaming or not.
> I don't think that streaming algorithms are "generally incremental" in
> Spark Streaming. In fact, data is collected and every N seconds
> (minutes/...), the data collected during that interval is batch-processed
> as with normal batch operations. In fact, using data previously obtained
> from the stream (in previous intervals) is a bit more complicated than
> plain batch processing. If the graph you want to create only uses data from
> one interval/batch, that should be dead simple. You might want to have a
> look at
> Tobias

View raw message