spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Richard Marscher <>
Subject Re: Create RDD from output of unix command
Date Wed, 08 Jul 2015 20:25:23 GMT
As a distributed data processing engine, Spark should be fine with millions
of lines. It's built with the idea of massive data sets in mind. Do you
have more details on how you anticipate the output of a unix command
interacting with a running Spark application? Do you expect Spark to be
continuously running and somehow observe unix command outputs? Or are you
thinking more along the lines of running a unix command with output and
then taking whatever format that is and running a spark job against it? If
it's the latter, it should be as simple as writing the command output to a
file and then loading the file into an RDD in Spark.

On Wed, Jul 8, 2015 at 2:02 PM, foobar <> wrote:

> What's the best practice of creating RDD from some external unix command
> output? I assume if the output size is large (say millions of lines),
> creating RDD from an array of all lines is not a good idea? Thanks!
> --
> View this message in context:
> Sent from the Apache Spark User List mailing list archive at
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

*Richard Marscher*
Software Engineer
Localytics <> | Our Blog
<> | Twitter <> |
Facebook <> | LinkedIn

View raw message