spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prashant Sharma <scrapco...@gmail.com>
Subject Re: Roadblock with Spark 0.8.0 ActorStream
Date Fri, 04 Oct 2013 08:57:22 GMT
On Thu, Oct 3, 2013 at 10:15 PM, Paul Snively <psnively@icloud.com> wrote:

> Hi Prahant,
>
> First, thanks so much for taking the time to investigate this!
>
> On Oct 3, 2013, at 3:59 AM, Prashant Sharma wrote:
>
> Trying to replicate it following change to ActorWordCount can reproduce it.
>
>  lines.flatMap(_.split("\\s+")).map(x => (x, 1)).reduceByKey(_ +
> _).foreach {
>       x => x foreach { x1 => con ! x1 }
>     }
>
> While this works
>  lines.flatMap(_.split("\\s+")).map(x => (x, 1)).reduceByKey(_ +
> _).foreach {
>        x => con ! x.collect()
>     }
>
>
> That's very interesting. So this appears to be a general issue with
> ActorStream, then?
>
> With all DStream's I suppose.

> There is another important thing, for now it is not possible to pass
> ActorRef while creating a props instance for actorStream. The reason is, it
> serializes the actorRef and sends it to runJob(which is dumb to not have
> actor system).
>
>
> I can see why it wouldn't be obvious that runJob() needs to have an
> ActorSystem in scope, but it does seem like something that needs fixing.
>
> So alternative approach like by passing path(as string) or ActorPath and
> then instantiate the ActorRef, may be used.
>
>
> You'd hope so, but if you run the unit test in my updated project, still
> at <https://www.dropbox.com/s/k9n5z47vw96r8o3/spray-ssl-server.zip>,
> you'll see that it still doesn't work. I suspect that an ActorRef that is
> part of the spray-io implementation is being serialized, about which
> there's nothing I can realistically do.
>
>
I think you are right !


> Incidentally, spray-io, which has become Akka IO as of Akka 2.2, is
> essentially a HIGHLY scalable presentation of Java NIO as Akka actors. It's
> the foundation of spray's own HTTP server and client implementation, and
> Akka's networking as well. In my case, I'm just trying to create a raw
> SSL/TLS-aware socket server that constructs a Spark Stream for each
> incoming connection. Again, since spray-io presents socket connections as
> Akka Actors, ActorStream seems like an obvious fit. And it is—we're just
> learning useful things about certain use cases.
>
>
I have not spent enough time on spray-io yet, but if it is implemented as
ActorExtension, then using actorStream is the best fit. For example I have
used akka-camel/akka-zeromq to receive streams from variety of endpoints. I
am not suggesting you should use it, although it has SSL support etc. That
was the original motivation while designing it. But if it can be somehow
used in a different and more useful way, I had encourage it.

I had like to understand a bit further, as to what are your ultimate
intention in a bit more detail on architecture from a higher level.
Probably I am hinting on a redesign. In your case, the code is still
complicated to spot where exactly we are leaking an ActorRef in the test
case ? or in our Handler.

It is also possible to start multiple actorStream receiver in advance and
then using their supervisor to start more actor as receiver for you, for
example on each connection (if that makes any sense, take a look at it).

This should be fixable I suppose in future, but for that I have to learn
> how to hack into our runjob mechanism. Will try to check it ASAP. For now
> may be we can create these issues in jira ?
>
>
> Very good idea. I will do that now.
>
>
> Please let me know if this fixes your problems.
>
>
> Unfortunately it doesn't. Please let me know if you are unable to
> reproduce the problem with my project, or have any other questions or
> comments.
>
> Thanks so much again for your time!
> Paul
>
>
>


-- 
s

Mime
View raw message