spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shao, Saisai" <saisai.s...@intel.com>
Subject RE: MappedStream vs Transform API
Date Mon, 16 Mar 2015 08:44:58 GMT
I think these two ways are both OK for you to write streaming job, `transform` is a more general
way for you to transform from one DStream to another if there’s no related DStream API (but
have related RDD API). But using map maybe more straightforward and easy to understand.

Thanks
Jerry

From: madhu phatak [mailto:phatak.dev@gmail.com]
Sent: Monday, March 16, 2015 4:32 PM
To: user@spark.apache.org
Subject: MappedStream vs Transform API

Hi,
  Current implementation of map function in spark streaming looks as below.

  def map[U: ClassTag](mapFunc: T => U): DStream[U] = {

  new MappedDStream(this, context.sparkContext.clean(mapFunc))
}
It creates an instance of MappedDStream which is a subclass of DStream.

The same function can be also implemented using transform API


def map[U: ClassTag](mapFunc: T => U): DStream[U] =

this.transform(rdd => {

  rdd.map(mapFunc)
})

Both implementation looks same. If they are same, is there any advantage having a subclass
of DStream?. Why can't we just use transform API?


Regards,
Madhukara Phatak
http://datamantra.io/
Mime
View raw message