spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Allen <n...@nickallen.org>
Subject How to 'Pipe' Binary Data in Apache Spark
Date Fri, 16 Jan 2015 15:09:32 GMT
I have an RDD containing binary data. I would like to use 'RDD.pipe' to
pipe that binary data to an external program that will translate it to
string/text data. Unfortunately, it seems that Spark is mangling the binary
data before it gets passed to the external program.
This code is representative of what I am trying to do. What am I doing
wrong? How can I pipe binary data in Spark?  Maybe it is getting corrupted
when I read it in initially with 'textFile'?

bin = sc.textFile("binary-data.dat")
csv = bin.pipe ("/usr/bin/binary-to-csv.sh")
csv.saveAsTextFile("text-data.csv")

Specifically, I am trying to use Spark to transform pcap (packet capture)
data to text/csv so that I can perform an analysis on it.

Thanks!

-- 
Nick Allen <nick@nickallen.org>

Mime
View raw message