spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Hübner <>
Subject Exchanging data between pyspark and scala
Date Wed, 03 Sep 2014 09:42:49 GMT
I am about to implement a spark app which will require to use both, pyspark and spark on scala.

Data should be read from AWS S3 (compressed CSV files), and must be pre-processed by an existing
Python codebase. However, our final goal is to make those datasets available for Spark apps
written in either Python or Scala through e.g. Tachyon. 

S3 => Pyspark => Tachyon => {Py, Scala}Spark

Is there any recommended way to pass data between Spark applications implemented in different
languages? I thought about using some sort of serialisation framework like Thrift or Avro,
but maybe there are other ways to do this (if possible without writing CSV files). I am open
for any kind of input!
To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message