spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Hamstra <m...@clearstorydata.com>
Subject Re: Can I share the RDD between multiprocess
Date Sat, 25 Jan 2014 17:02:21 GMT
It's a basic strategy that several organizations using Spark have followed,
but there isn't yet a canonical implementation or example of such a server
in the Spark source code.  That is likely to change before the 1.0 release,
and the included job server is likely to be based on an updated/expanded
version of an existing pull
request<https://github.com/apache/incubator-spark/pull/222>
.


On Sat, Jan 25, 2014 at 6:02 AM, Kapil Malik <kmalik@adobe.com> wrote:

>  Hi Christopher,
>
>
>
> “make a "server" out of that JVM, and serve up (via HTTP/THRIFT, etc.)
> some kind of reference to those RDDs to multiple clients of that server”
>
>
>
> Can you kindly hint at any starting points regarding your suggestion?
>
> In my understanding, SparkContext constructor creates an Akka actor system
> and starts a jetty UI server. So can we somehow use / tweak the same to
> serve to multiple clients? Or can we simply construct a spark context
> inside a Java server (like Tomcat) ?
>
>
>
> Regards,
>
>
>
> Kapil Malik | kmalik@adobe.com | 33430 / 8800836581
>
>
>
> *From:* Christopher Nguyen [mailto:ctn@adatao.com]
> *Sent:* 25 January 2014 12:00
> *To:* user@spark.incubator.apache.org
> *Subject:* Re: Can I share the RDD between multiprocess
>
>
>
> D.Y., it depends on what you mean by "multiprocess".
>
>
>
> RDD lifecycles are currently limited to a single SparkContext. So to
> "share" RDDs you need to somehow access the same SparkContext.
>
>
>
> This means one way to share RDDs is to make sure your accessors are in the
> same JVM that started the SparkContext.
>
>
>
> Another is to make a "server" out of that JVM, and serve up (via
> HTTP/THRIFT, etc.) some kind of reference to those RDDs to multiple clients
> of that server, even though there is only one SparkContext (held by the
> server). We have built a server product using this pattern so I know it can
> work well.
>
>
>   --
>
> Christopher T. Nguyen
>
> Co-founder & CEO, Adatao <http://adatao.com>
>
> linkedin.com/in/ctnguyen
>
>
>
>
>
> On Fri, Jan 24, 2014 at 6:06 PM, D.Y Feng <yyfeng88625@gmail.com> wrote:
>
> How can I share the RDD between multiprocess?
>
>
> --
>
>
> DY.Feng(叶毅锋)
> yyfeng88625@twitter
> Department of Applied Mathematics
> Guangzhou University,China
> dyfeng@stu.gzhu.edu.cn
>
>
>
>

Mime
View raw message