spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tathagata Das <t...@databricks.com>
Subject Re: MappedStream vs Transform API
Date Mon, 16 Mar 2015 18:12:38 GMT
It's mostly for legacy reasons. First we had added all the MappedDStream,
etc. and then later we realized we need to expose something that is more
generic for arbitrary RDD-RDD transformations. It can be easily replaced.
However, there is a slight value in having MappedDStream, for developers to
learn about DStreams.

TD

On Mon, Mar 16, 2015 at 3:37 AM, madhu phatak <phatak.dev@gmail.com> wrote:

> Hi,
>  Thanks for the response. I understand that part. But I am asking why the
> internal implementation using a subclass when it can use an existing api?
> Unless there is a real difference, it feels like code smell to me.
>
>
> Regards,
> Madhukara Phatak
> http://datamantra.io/
>
> On Mon, Mar 16, 2015 at 2:14 PM, Shao, Saisai <saisai.shao@intel.com>
> wrote:
>
>>  I think these two ways are both OK for you to write streaming job,
>> `transform` is a more general way for you to transform from one DStream to
>> another if there’s no related DStream API (but have related RDD API). But
>> using map maybe more straightforward and easy to understand.
>>
>>
>>
>> Thanks
>>
>> Jerry
>>
>>
>>
>> *From:* madhu phatak [mailto:phatak.dev@gmail.com]
>> *Sent:* Monday, March 16, 2015 4:32 PM
>> *To:* user@spark.apache.org
>> *Subject:* MappedStream vs Transform API
>>
>>
>>
>> Hi,
>>
>>   Current implementation of map function in spark streaming looks as
>> below.
>>
>>
>>
>>   *def *map[U: ClassTag](mapFunc: T => U): DStream[U] = {
>>
>>   *new *MappedDStream(*this*, context.sparkContext.clean(mapFunc))
>> }
>>
>>  It creates an instance of MappedDStream which is a subclass of DStream.
>>
>>
>>
>> The same function can be also implemented using transform API
>>
>>
>>
>> *def map*[U: ClassTag](mapFunc: T => U): DStream[U] =
>>
>> this.transform(rdd => {
>>
>>   rdd.map(mapFunc)
>> })
>>
>>
>>
>> Both implementation looks same. If they are same, is there any advantage
>> having a subclass of DStream?. Why can't we just use transform API?
>>
>>
>>
>>
>>
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>
>

Mime
View raw message