spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Aberline <peter.aberl...@gmail.com>
Subject PySpark sequence file support
Date Fri, 18 Oct 2013 09:10:49 GMT
Hi

I've just noticed that the ability to read sequence files does not look like it's been implemented
yet by the PySpark API? 

Would it be a difficult task for me to add this feature without being familiar with the code
base?

Alternatively, is there any work around for this? My data is in a single very large sequence
file containing > 250,000 elements. My code is already in python. I'm writing the sequence
file using Pydoop, so perhaps there is a way to build a RDD by reading in via Pydoop?

Thanks,
Peter
Mime
View raw message