spark-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teng Long <longteng...@gmail.com>
Subject Re: Can I add a new method to RDD class?
Date Mon, 05 Dec 2016 22:14:32 GMT
I’m trying to implement a transformation that can merge partitions (to align with GPU specs)
and move them onto GPU memory, for example rdd.toGPU() and later transformations like map
can automatically be performed on GPU. And another transformation rdd.offGPU() to move partitions
off GPU memory and repartition them to the way they were on CPU before.

Thank you, Tarun, for creating that gist. I’ll look at it and see if it meets my needs.

> On Dec 5, 2016, at 5:07 PM, Tarun Kumar <tarunk1407@gmail.com> wrote:
> 
> Teng,
> 
> Can you please share the details of transformation that you want to implement in your
method foo?
> 
> I have created a gist of one dummy transformation for your method foo , this foo method
transforms from an RDD[T] to RDD[(T,T)]. Many such more transformations can easily be achieved.
> 
> https://gist.github.com/fidato13/3b46fe1c96b37ae0dd80c275fbe90e92 <https://gist.github.com/fidato13/3b46fe1c96b37ae0dd80c275fbe90e92>
> 
> Thanks
> Tarun Kumar
> 
> On 5 December 2016 at 22:33, Thakrar, Jayesh <jthakrar@conversantmedia.com <mailto:jthakrar@conversantmedia.com>>
wrote:
> Teng,
> 
>  
> 
> Before you go down creating your own custom Spark system, do give some thought to what
Holden and others are suggesting, viz. using implicit methods.
> 
>  
> 
> If you want real concrete examples, have a look at the Spark Cassandra Connector -
> 
>  
> 
> Here you will see an example of "extending" SparkContext - https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/2_loading.md>
>  
> 
> // validation is deferred, so it is not triggered during rdd creation
> 
> val rdd = sc.cassandraTable[SomeType]("ks", "not_existing_table")
> 
> val emptyRDD = rdd.toEmptyCassandraRDD
> 
>  
> 
> val emptyRDD2 = sc.emptyCassandraTable[SomeType]("ks", "not_existing_table"))
> 
>  
> 
>  
> 
> And here you will se an example of "extending" RDD - https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md
<https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md>
>  
> 
> case class WordCount(word: String, count: Long)
> 
> val collection = sc.parallelize(Seq(WordCount("dog", 50), WordCount("cow", 60)))
> 
> collection.saveToCassandra("test", "words", SomeColumns("word", "count"))
> 
>  
> 
> Hope that helps…
> 
> Jayesh
> 
>  
> 
>  
> 
> From: Teng Long <longteng.cq@gmail.com <mailto:longteng.cq@gmail.com>>
> Date: Monday, December 5, 2016 at 3:04 PM
> To: Holden Karau <holden@pigscanfly.ca <mailto:holden@pigscanfly.ca>>, <dev@spark.apache.org
<mailto:dev@spark.apache.org>>
> Subject: Re: Can I add a new method to RDD class?
> 
>  
> 
> Thank you for providing another answer, Holden.
> 
>  
> 
> So I did what Tarun and Michal suggested, and it didn’t work out as I want to have
a new transformation method in RDD class, and need to use that RDD’s spark context which
is private. So I guess the only thing I can do now is to sbt publishLocal?
> 
>  
> 
> On Dec 5, 2016, at 9:19 AM, Holden Karau <holden@pigscanfly.ca <mailto:holden@pigscanfly.ca>>
wrote:
> 
>  
> 
> Doing that requires publishing a custom version of Spark, you can edit the version number
do do a publishLocal - but maintaining that change is going to be difficult. The other approaches
suggested are probably better, but also does your method need to be defined on the RDD class?
Could you instead make a helper object or class to expose whatever functionality you need?
> 
>  
> 
> On Mon, Dec 5, 2016 at 6:06 PM long <longteng.cq@gmail.com <mailto:longteng.cq@gmail.com>>
wrote:
> 
> Thank you very much! But why can’t I just add new methods in to the source code of
RDD?
> 
>  
> 
> On Dec 5, 2016, at 3:15 AM, Michal Šenkýř [via Apache Spark Developers List] <[hidden
email] <http://user/SendEmail.jtp?type=node&node=20107&i=0>> wrote:
> 
>  
> 
> A simple Scala example of implicit classes:
> 
> implicit class EnhancedString(str: String) {
>   def prefix(prefix: String) = prefix + str
> }
>  
> println("World".prefix("Hello "))
> As Tarun said, you have to import it if it's not in the same class where you use it.
> 
> Hope this makes it clearer,
> 
> Michal Senkyr
> 
>  
> 
> On 5.12.2016 07:43, Tarun Kumar wrote:
> 
> Not sure if that's documented in terms of Spark but this is a fairly common pattern in
scala known as "pimp my library" pattern, you can easily find many generic example of using
this pattern. If you want I can quickly cook up a short conplete  example with rdd(although
there is nothing really more to my example in earlier mail) ? Thanks Tarun Kumar
> 
>  
> 
> On Mon, 5 Dec 2016 at 7:15 AM, long <<a href="x-msg://22/user/SendEmail.jtp?type=node&amp;node=20106&amp;i=0
<>" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote:
> 
> So is there documentation of this I can refer to? 
> 
>  
> 
> On Dec 5, 2016, at 1:07 AM, Tarun Kumar [via Apache Spark Developers List] <[hidden
email] <http://user/SendEmail.jtp?type=node&node=20104&i=0>> wrote:
> 
>  
> 
> Hi Tenglong, In addition to trsell's reply, you can add any method to an rdd without
making changes to spark code. This can be achieved by using implicit class in your own client
code: implicit class extendRDD[T](rdd: RDD[T]){ def foo() } Then you basically nees to import
this implicit class in scope where you want to use the new foo method. Thanks Tarun Kumar
> 
>  
> 
> On Mon, 5 Dec 2016 at 6:59 AM, <<a href="<a href="x-msg://19/user/SendEmail.jtp?type=node&amp;amp;node=20102&amp;amp;i=0
<>" class="">x-msg://19/user/SendEmail.jtp?type=node&amp;node=20102&amp;i=0
<>" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote:
> 
> How does your application fetch the spark dependency? Perhaps list your project dependencies
and check it's using your dev build.
> 
>  
> 
> On Mon, 5 Dec 2016, 08:47 tenglong, <<a href="<a href="x-msg://19/user/SendEmail.jtp?type=node&amp;amp;node=20102&amp;amp;i=1
<>" class="">x-msg://19/user/SendEmail.jtp?type=node&amp;node=20102&amp;i=1
<>" target="_top" rel="nofollow" link="external" class="">[hidden email]> wrote:
> 
> Hi,
> 
> Apparently, I've already tried adding a new method to RDD,
> 
> for example,
> 
> class RDD {
>   def foo() // this is the one I added
> 
>   def map()
> 
>   def collect()
> }
> 
> I can build Spark successfully, but I can't compile my application code
> which calls rdd.foo(), and the error message says
> 
> value foo is not a member of org.apache.spark.rdd.RDD[String]
> 
> So I am wondering if there is any mechanism prevents me from doing this or
> something I'm doing wrong?
> 
> 
> 
> 
> --
> View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100.html
<http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100.html>
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com <http://nabble.com/>.
> 
> ---------------------------------------------------------------------
> 
> To unsubscribe e-mail: <a href="<a href="x-msg://19/user/SendEmail.jtp?type=node&amp;amp;node=20102&amp;amp;i=2
<>" class="">x-msg://19/user/SendEmail.jtp?type=node&amp;node=20102&amp;i=2
<>" target="_top" rel="nofollow" link="external" class="">[hidden email]
> 
>  
> 
> If you reply to this email, your message will be added to the discussion below:
> 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20102.html
<http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20102.html>
> To unsubscribe from Can I add a new method to RDD class?, click here.
> NAML <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>  
> 
> View this message in context: Re: Can I add a new method to RDD class? <http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20104.html>
> Sent from the Apache Spark Developers List mailing list archive <http://apache-spark-developers-list.1001551.n3.nabble.com/>
at Nabble.com <http://nabble.com/>.
> 
> If you reply to this email, your message will be added to the discussion below:
> 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20106.html
<http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20106.html>
> To unsubscribe from Can I add a new method to RDD class?, click here.
> NAML <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>  
> 
> View this message in context: Re: Can I add a new method to RDD class? <http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-add-a-new-method-to-RDD-class-tp20100p20107.html>
> Sent from the Apache Spark Developers List mailing list archive <http://apache-spark-developers-list.1001551.n3.nabble.com/>
at Nabble.com <http://nabble.com/>.
> 
>  
> 
> 


Mime
View raw message