spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Writing your own RDD
Date Wed, 07 Aug 2013 23:44:42 GMT
Hi Usman,

I believe the easiest way would be to create an RDD of Strings in Java or Scala. It's pretty
easy to wrap that into a PySpark RDD object. For example, take a look at pyspark.SparkContext.textFile,
in context.py. It just creates a Java RDD of Strings and then wraps it.

Matei

On Aug 5, 2013, at 5:59 PM, Usman Masood <usmanm@locu.com> wrote:

> Hey,
> 
> Is it possible to write a custom RDD in Python using PySpark? We have a HTTP api for
reading time series data which supports range scans (so it should be easy to partition data)
and we're considering using Spark to analyze that data. If we can't write a RDD in Python,
is it possible to write one in Scala and then make use of it in Python land?
> 
> Usman


Mime
View raw message