spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mayur Rustagi <mayur.rust...@gmail.com>
Subject Re: Can I share the RDD between multiprocess
Date Sat, 25 Jan 2014 18:20:47 GMT
Will Job server work here?
http://engineering.ooyala.com/blog/open-sourcing-our-spark-job-server

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Sat, Jan 25, 2014 at 10:46 PM, Kapil Malik <kmalik@adobe.com> wrote:

>  Thanks a lot Mark and Christopher for your prompt replies and
> clarification.
>
>
>
> Regards,
>
>
>
> Kapil Malik | kmalik@adobe.com
>
>
>
> *From:* Christopher Nguyen [mailto:ctn@adatao.com]
> *Sent:* 25 January 2014 22:34
> *To:* user@spark.incubator.apache.org
> *Subject:* RE: Can I share the RDD between multiprocess
>
>
>
> Kapil, that's right, your #2 is the pattern I was referring to. Of course
> it could be Tomcat or something even lighter weight as long as you define
> some suitable client/server protocol.
>
> Sent while mobile. Pls excuse typos etc.
>
> On Jan 25, 2014 6:03 AM, "Kapil Malik" <kmalik@adobe.com> wrote:
>
> Hi Christopher,
>
>
>
> “make a "server" out of that JVM, and serve up (via HTTP/THRIFT, etc.)
> some kind of reference to those RDDs to multiple clients of that server”
>
>
>
> Can you kindly hint at any starting points regarding your suggestion?
>
> In my understanding, SparkContext constructor creates an Akka actor system
> and starts a jetty UI server. So can we somehow use / tweak the same to
> serve to multiple clients? Or can we simply construct a spark context
> inside a Java server (like Tomcat) ?
>
>
>
> Regards,
>
>
>
> Kapil Malik | kmalik@adobe.com | 33430 / 8800836581 <%208800836581>
>
>
>
> *From:* Christopher Nguyen [mailto:ctn@adatao.com]
> *Sent:* 25 January 2014 12:00
> *To:* user@spark.incubator.apache.org
> *Subject:* Re: Can I share the RDD between multiprocess
>
>
>
> D.Y., it depends on what you mean by "multiprocess".
>
>
>
> RDD lifecycles are currently limited to a single SparkContext. So to
> "share" RDDs you need to somehow access the same SparkContext.
>
>
>
> This means one way to share RDDs is to make sure your accessors are in the
> same JVM that started the SparkContext.
>
>
>
> Another is to make a "server" out of that JVM, and serve up (via
> HTTP/THRIFT, etc.) some kind of reference to those RDDs to multiple clients
> of that server, even though there is only one SparkContext (held by the
> server). We have built a server product using this pattern so I know it can
> work well.
>
>
>   --
>
> Christopher T. Nguyen
>
> Co-founder & CEO, Adatao <http://adatao.com>
>
> linkedin.com/in/ctnguyen
>
>
>
>
>
> On Fri, Jan 24, 2014 at 6:06 PM, D.Y Feng <yyfeng88625@gmail.com> wrote:
>
> How can I share the RDD between multiprocess?
>
>
> --
>
>
> DY.Feng(叶毅锋)
> yyfeng88625@twitter
> Department of Applied Mathematics
> Guangzhou University,China
> dyfeng@stu.gzhu.edu.cn
>
>
>
>

Mime
View raw message