spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rishi Yadav <ri...@infoobjects.com>
Subject Re: Declaring multiple RDDs and efficiency concerns
Date Fri, 14 Nov 2014 17:36:03 GMT
how about using fluent style of Scala programming.


On Fri, Nov 14, 2014 at 8:31 AM, Simone Franzini <captainfranz@gmail.com>
wrote:

> Let's say I have to apply a complex sequence of operations to a certain
> RDD.
> In order to make code more modular/readable, I would typically have
> something like this:
>
> object myObject {
>   def main(args: Array[String]) {
>     val rdd1 = function1(myRdd)
>     val rdd2 = function2(rdd1)
>     val rdd3 = function3(rdd2)
>   }
>
>   def function1(rdd: RDD) : RDD = { doSomething }
>   def function2(rdd: RDD) : RDD = { doSomethingElse }
>   def function3(rdd: RDD) : RDD = { doSomethingElseYet }
> }
>
> So I am explicitly declaring vals for the intermediate steps. Does this
> end up using more storage than if I just chained all of the operations and
> declared only one val instead?
> If yes, is there a better way to chain together the operations?
> Ideally I would like to do something like:
>
> val rdd = function1.function2.function3
>
> Is there a way I can write the signature of my functions to accomplish
> this? Is this also an efficiency issue or just a stylistic one?
>
> Simone Franzini, PhD
>
> http://www.linkedin.com/in/simonefranzini
>

Mime
View raw message