spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andy Davidson <A...@SantaCruzIntegration.com>
Subject Re: RDD pipe example. Is this a bug or a feature?
Date Fri, 19 Sep 2014 22:59:16 GMT
Hi Jey

Many thanks for the code example. Here is what I really want to do. I want
to use Spark Stream and python. Unfortunately pySpark does not support
streams yet. It was suggested the way to work around this was to use an RDD
pipe. The example bellow was a little experiment.

You can think of my system as following the standard unix shell script pipe
design

Stream of data -> spark -> down stream system not implemented in spark

After seeing your example code I now understand how the stdin and stdout get
configured. 

It seem like pipe() does not work the way I want. I guess I could open a
socket and write to the down stream process.

Any suggestions would be greatly appreciated

Thanks Andy 

From:  Jey Kottalam <jey@cs.berkeley.edu>
Reply-To:  <jey@cs.berkeley.edu>
Date:  Friday, September 19, 2014 at 12:35 PM
To:  Andrew Davidson <Andy@SantaCruzIntegration.com>
Cc:  "user@spark.apache.org" <user@spark.apache.org>
Subject:  Re: RDD pipe example. Is this a bug or a feature?

> Hi Andy,
> 
> That's a feature -- you'll have to print out the return value from
> collect() if you want the contents to show up on stdout.
> 
> Probably something like this:
> 
> for(Iterator<String> iter = rdd.pipe(pwd +
> "/src/main/bin/RDDPipe.sh").collect().iterator(); iter.hasNext();)
>    System.out.println(iter.next());
> 
> 
> Hope that helps,
> -Jey
> 
> On Fri, Sep 19, 2014 at 11:21 AM, Andy Davidson
> <Andy@santacruzintegration.com> wrote:
>>  Hi
>> 
>>  I am wrote a little java job to try and figure out how RDD pipe works.
>>  Bellow is my test shell script. If in the script I turn on debugging I get
>>  output. In my console. If debugging is turned off in the shell script, I do
>>  not see anything in my console. Is this a bug or feature?
>> 
>>  I am running the job locally on a Mac
>> 
>>  Thanks
>> 
>>  Andy
>> 
>> 
>>  Here is my Java
>> 
>>          rdd.pipe(pwd + "/src/main/bin/RDDPipe.sh").collect();
>> 
>> 
>> 
>>  #!/bin/sh
>> 
>> 
>>  #
>> 
>>  # Use this shell script to figure out how spark RDD pipe() works
>> 
>>  #
>> 
>> 
>>  set -x # turns shell debugging on
>> 
>>  #set +x # turns shell debugging off
>> 
>> 
>>  while read x ;
>> 
>>  do
>> 
>>  echo RDDPipe.sh $x ;
>> 
>>  Done
>> 
>> 
>> 
>>  Here is the output if debugging is turned on
>> 
>>  $ !grep
>> 
>>  grep RDDPipe run.sh.out
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 2
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 3
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  + echo RDDPipe.sh 0
>> 
>>  $
> 



Mime
View raw message