spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Silvio Fiorito <>
Subject Re: How to 'Pipe' Binary Data in Apache Spark
Date Thu, 22 Jan 2015 16:38:09 GMT

Have you tried

I’ve used this in a Spark app already and didn’t have any issues. My use case was slightly
different than yours, but you should give it a try.

From: Nick Allen <<>>
Date: Friday, January 16, 2015 at 10:09 AM
To: "<>" <<>>
Subject: How to 'Pipe' Binary Data in Apache Spark

I have an RDD containing binary data. I would like to use 'RDD.pipe' to pipe that binary data
to an external program that will translate it to string/text data. Unfortunately, it seems
that Spark is mangling the binary data before it gets passed to the external program.

This code is representative of what I am trying to do. What am I doing wrong? How can I pipe
binary data in Spark?  Maybe it is getting corrupted when I read it in initially with 'textFile'?

bin = sc.textFile("binary-data.dat")
csv = bin.pipe ("/usr/bin/")

Specifically, I am trying to use Spark to transform pcap (packet capture) data to text/csv
so that I can perform an analysis on it.


Nick Allen <<>>
View raw message