spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Campbell <michael.campb...@gmail.com>
Subject Re: HELP!? Re: Streaming trouble (reduceByKey causes all printing to stop)
Date Thu, 12 Jun 2014 15:49:30 GMT
I don't know if this matters, but I'm looking at the web site that spark
puts up, and I see under the streaming tab:


   - *Started at: *Thu Jun 12 11:42:10 EDT 2014
   - *Time since start: *6 minutes 3 seconds
   - *Network receivers: *1
   - *Batch interval: *5 seconds
   - *Processed batches: *0
   - *Waiting batches: *1




Why would a batch be "waiting" for long over my batch time of 5 seconds?


On Thu, Jun 12, 2014 at 10:18 AM, Michael Campbell <
michael.campbell@gmail.com> wrote:

> Annnnd... it's NOT working.
>
> Here's the code:
>
>         val bytes = kafkaStream.map({ case (key, messageBytes) =>
> messageBytes}) // Map to just get the bytes part out...
>         val things = bytes.flatMap(bytesArrayToThings) // convert to a
> thing
>         val srcDestinations = things.map(thing =>
> (ipToString(thing.getSourceIp), Set(ipToString(thing.getDestinationIp))))
> // up to this point works fine.
>
> // this fails to print
>         val srcDestinationSets = srcDestinations.reduceByKey((exist:
> Set[String], addl: Set[String]) => exist ++ addl)
>
>
> What it does...
>
> From a kafka message containing many "things", convert the message to an
> array of said "things", flatMap them out to a stream of 1 "thing" at a
> time, pull out and make a Tuple of a (SourceIP, DestinationIP).
>
> ALL THAT WORKS.  If I do a "srcDestinations.print()" I get output like the
> following, every 5 seconds, which is my batch size.
>
> -------------------------------------------
> Time: 1402582000000 ms
> -------------------------------------------
> (10.30.51.216,Set(10.20.1.1))
> (10.20.11.3,Set(10.10.61.98))
> (10.20.11.3,Set(10.10.61.95))
> ...
>
>
>
> What I want is a SET of (sourceIP -> Set(all the destination Ips))  Using
> a set because as you can see above, the same source may have the same
> destination multiple times and I want to eliminate dupes on the destination
> side.
>
> When I call the reduceByKey() method, I never get any output.  When I do a
> "srcDestinationSets.print()"  NOTHING EVER PRINTS.  Ever.  Never.
>
> What am I doing wrong?  (The same happens for "reduceByKeyAndWindow(...,
> Seconds(5))".)
>
> I'm sure this is something I've done, but I cannot figure out what it was.
>
> Help, please?
>
>
>

Mime
View raw message