spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <>
Subject Re: Roadblock with Spark 0.8.0 ActorStream
Date Wed, 09 Oct 2013 09:25:28 GMT
On Fri, Oct 4, 2013 at 8:57 PM, Paul Snively <> wrote:

> Hi Prashant!
> That's very interesting. So this appears to be a general issue with
>> ActorStream, then?
>> With all DStream's I suppose.
> OK, so that seems like a bug independent of the issue around having no
> ActorSystem in scope for runJob(), then.
> So, it is not a bug as Matei already pointed out.

> I have not spent enough time on spray-io yet, but if it is implemented as
> ActorExtension, then using actorStream is the best fit. For example I have
> used akka-camel/akka-zeromq to receive streams from variety of endpoints. I
> am not suggesting you should use it, although it has SSL support etc. That
> was the original motivation while designing it. But if it can be somehow
> used in a different and more useful way, I had encourage it.
>  I'm reluctant to add another abstraction layer here, since the motivation
> for using spray-io in the first place is its extreme scalability, and we
> mostly want to rely on Spark Streaming here for managing availability and
> recovery. To me, it seems like we've identified two obvious issues: one
> revolving around foreach ... foreach ... over a DStream not working
> correctly, the other revolving around runJob() not having an ActorSystem in
> scope. While I understand that there is much other work to do, both of
> these seem, first of all, rather serious, and secondly, as if they should
> have relatively easy fixes. Of course, I'm open to being corrected in this
> belief. :-)
> I had like to understand a bit further, as to what are your ultimate
> intention in a bit more detail on architecture from a higher level.
> Probably I am hinting on a redesign.
> That's certainly a valid option!
> Without discussing things I'm not at liberty to, all I can really say is
> that I'm trying to build a real-time streaming server that takes a
> connection from a client that expects to be able to then send arbitrary
> bytes to the server. On the server, I want a DStream per socket connection
> to process the incoming data. There's other stuff, such as this needing to
> support TLS mutual authentication, but I'll get there when I have data
> flowing. :-)
> It's really quite simple, or at least it should be, and again, I believe
> it will be, when the bugs we've identified are fixed.
> In your case, the code is still complicated to spot where exactly we are
> leaking an ActorRef in the test case ?
> If it helps, the only ActorRef in my code now is in the SpraySslClient,
> i.e. nowhere that's being serialized.
>  or in our Handler.
> I have to respectfully suggest it is indeed this. OTOH, is it really a
> "leak?" I am, in fact, closing over objects that, sooner or later,
> construct ActorRefs. The whole point of my task is to do something with
> data coming from an ActorRef that represents a TCP connection, then send
> the result back to that same ActorRef. I guess I'm having a little trouble
> understanding how many actorStream use cases there are that wouldn't
> involve ultimately closing over at least one ActorRef. :-)
> It is also possible to start multiple actorStream receiver in advance and
> then using their supervisor to start more actor as receiver for you, for
> example on each connection (if that makes any sense, take a look at it).
> I'm afraid I didn't quite follow that, but it sounds interesting!
> At this point, though, I have to suggest that the path forward, simply for
> Spark Streaming to work correctly out of the box, is clear. I have created
> an issue against the missing ActorSystem in runJob() in JIRA, but it sounds
> like we need another one for the foreach ... foreach ... issue. I'm afraid
> I don't know exactly how to characterize that issue. Could I possibly
> impose upon you to create that ticket?
BTW !, Did you consider doing above ? It can make some sense from the way
spark streaming is designed. IMO Spawning a new dstream on each connection
doesn't seem good architectural choice, rather a new actor is. I might be

Thanks again for digging into tis!
> Paul
> --

View raw message