spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From madhu phatak <phatak....@gmail.com>
Subject Re: MappedStream vs Transform API
Date Tue, 17 Mar 2015 08:56:04 GMT
Hi,
 Thank you for the  response.

 Can I give a PR to use transform for all the functions like map,flatMap
etc so they are consistent with other API's?.

Regards,
Madhukara Phatak
http://datamantra.io/

On Mon, Mar 16, 2015 at 11:42 PM, Tathagata Das <tdas@databricks.com> wrote:

> It's mostly for legacy reasons. First we had added all the MappedDStream,
> etc. and then later we realized we need to expose something that is more
> generic for arbitrary RDD-RDD transformations. It can be easily replaced.
> However, there is a slight value in having MappedDStream, for developers to
> learn about DStreams.
>
> TD
>
> On Mon, Mar 16, 2015 at 3:37 AM, madhu phatak <phatak.dev@gmail.com>
> wrote:
>
>> Hi,
>>  Thanks for the response. I understand that part. But I am asking why the
>> internal implementation using a subclass when it can use an existing api?
>> Unless there is a real difference, it feels like code smell to me.
>>
>>
>> Regards,
>> Madhukara Phatak
>> http://datamantra.io/
>>
>> On Mon, Mar 16, 2015 at 2:14 PM, Shao, Saisai <saisai.shao@intel.com>
>> wrote:
>>
>>>  I think these two ways are both OK for you to write streaming job,
>>> `transform` is a more general way for you to transform from one DStream to
>>> another if there’s no related DStream API (but have related RDD API). But
>>> using map maybe more straightforward and easy to understand.
>>>
>>>
>>>
>>> Thanks
>>>
>>> Jerry
>>>
>>>
>>>
>>> *From:* madhu phatak [mailto:phatak.dev@gmail.com]
>>> *Sent:* Monday, March 16, 2015 4:32 PM
>>> *To:* user@spark.apache.org
>>> *Subject:* MappedStream vs Transform API
>>>
>>>
>>>
>>> Hi,
>>>
>>>   Current implementation of map function in spark streaming looks as
>>> below.
>>>
>>>
>>>
>>>   *def *map[U: ClassTag](mapFunc: T => U): DStream[U] = {
>>>
>>>   *new *MappedDStream(*this*, context.sparkContext.clean(mapFunc))
>>> }
>>>
>>>  It creates an instance of MappedDStream which is a subclass of DStream.
>>>
>>>
>>>
>>> The same function can be also implemented using transform API
>>>
>>>
>>>
>>> *def map*[U: ClassTag](mapFunc: T => U): DStream[U] =
>>>
>>> this.transform(rdd => {
>>>
>>>   rdd.map(mapFunc)
>>> })
>>>
>>>
>>>
>>> Both implementation looks same. If they are same, is there any advantage
>>> having a subclass of DStream?. Why can't we just use transform API?
>>>
>>>
>>>
>>>
>>>
>>> Regards,
>>> Madhukara Phatak
>>> http://datamantra.io/
>>>
>>
>>
>

Mime
View raw message