spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aecc <>
Subject data within batchduration in RDD of a Dstream reliable?
Date Thu, 23 Jan 2014 20:12:19 GMT

I know that every RDD received in a DStream are replicated to 2 nodes by
default. However if i choose a big batchDuration (let's say 5 min), data
that is received in the stream is also reliably stored? How? As far as I
know are the RDDs the ones that stored reliably (once the RDD has it's
complete data from the batchDuration).

View this message in context:
Sent from the Apache Spark User List mailing list archive at

View raw message