spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: MappedStream vs Transform API
Date Mon, 16 Mar 2015 10:37:18 GMT
Hi,
 Thanks for the response. I understand that part. But I am asking why the
internal implementation using a subclass when it can use an existing api?
Unless there is a real difference, it feels like code smell to me.


Regards,
Madhukara Phatak
http://datamantra.io/

On Mon, Mar 16, 2015 at 2:14 PM, Shao, Saisai <saisai.shao@intel.com> wrote:

>  I think these two ways are both OK for you to write streaming job,
> `transform` is a more general way for you to transform from one DStream to
> another if there’s no related DStream API (but have related RDD API). But
> using map maybe more straightforward and easy to understand.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* madhu phatak [mailto:phatak.dev@gmail.com]
> *Sent:* Monday, March 16, 2015 4:32 PM
> *To:* user@spark.apache.org
> *Subject:* MappedStream vs Transform API
>
>
>
> Hi,
>
>   Current implementation of map function in spark streaming looks as below.
>
>
>
>   *def *map[U: ClassTag](mapFunc: T => U): DStream[U] = {
>
>   *new *MappedDStream(*this*, context.sparkContext.clean(mapFunc))
> }
>
>  It creates an instance of MappedDStream which is a subclass of DStream.
>
>
>
> The same function can be also implemented using transform API
>
>
>
> *def map*[U: ClassTag](mapFunc: T => U): DStream[U] =
>
> this.transform(rdd => {
>
>   rdd.map(mapFunc)
> })
>
>
>
> Both implementation looks same. If they are same, is there any advantage
> having a subclass of DStream?. Why can't we just use transform API?
>
>
>
>
>
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>

Mime
View raw message