On Mon, Jan 20, 2014 at 11:05 PM, Ognen Duzlevski <ognen@nengoiksvelzud.com> wrote:

Thanks. I will try that but your assumption is that something is failing in an obvious way with a message. By the look of the spark-shell - just frozen I would say something is "stuck".  Will report back.

Given the suspicious nature of the "freezing" of the shell, it looked to me like a timeout or some kind of a "wait".

I whipped out tcpdump on a node in the cluster and noticed that the nodes try to connect back to master on some (random?) port. I realized that my VPC security group was too restrictive. As soon as I allowed all tcp and udp traffic within the VPC, it magically worked ;)

So, problem solved. It is not a bug after all, just traffic being blocked.

In any case, I am documenting this as I go. As soon as I have a viable "data pipeline" in the VPC I will publish something for everyone to read, I figure another experience wouldn't hurt.