spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kapil Malik <kma...@adobe.com>
Subject RE: Can I share the RDD between multiprocess
Date Sat, 25 Jan 2014 14:02:41 GMT
Hi Christopher,

“make a "server" out of that JVM, and serve up (via HTTP/THRIFT, etc.) some kind of reference
to those RDDs to multiple clients of that server”

Can you kindly hint at any starting points regarding your suggestion?
In my understanding, SparkContext constructor creates an Akka actor system and starts a jetty
UI server. So can we somehow use / tweak the same to serve to multiple clients? Or can we
simply construct a spark context inside a Java server (like Tomcat) ?

Regards,

Kapil Malik | kmalik@adobe.com<mailto:kmalik@adobe.com> | 33430 / 8800836581

From: Christopher Nguyen [mailto:ctn@adatao.com]
Sent: 25 January 2014 12:00
To: user@spark.incubator.apache.org
Subject: Re: Can I share the RDD between multiprocess

D.Y., it depends on what you mean by "multiprocess".

RDD lifecycles are currently limited to a single SparkContext. So to "share" RDDs you need
to somehow access the same SparkContext.

This means one way to share RDDs is to make sure your accessors are in the same JVM that started
the SparkContext.

Another is to make a "server" out of that JVM, and serve up (via HTTP/THRIFT, etc.) some kind
of reference to those RDDs to multiple clients of that server, even though there is only one
SparkContext (held by the server). We have built a server product using this pattern so I
know it can work well.

--
Christopher T. Nguyen
Co-founder & CEO, Adatao<http://adatao.com>
linkedin.com/in/ctnguyen<http://linkedin.com/in/ctnguyen>


On Fri, Jan 24, 2014 at 6:06 PM, D.Y Feng <yyfeng88625@gmail.com<mailto:yyfeng88625@gmail.com>>
wrote:
How can I share the RDD between multiprocess?

--


DY.Feng(叶毅锋)
yyfeng88625@twitter
Department of Applied Mathematics
Guangzhou University,China
dyfeng@stu.gzhu.edu.cn<mailto:dyfeng@stu.gzhu.edu.cn>


Mime
View raw message