spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matei Zaharia <matei.zaha...@gmail.com>
Subject Re: Quality of documentation (rant)
Date Tue, 21 Jan 2014 19:10:07 GMT
Ah okay, kind of weird that it worked with a small file. Maybe that was being done locally
since the file was small.

If you do run into further issues with S3, one other idea is to build Spark against a newer
version of the Hadoop client library (Spark uses Hadoop’s data source classes to read data,
so its S3 support comes from that library). You can do this by rebuilding Spark with

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt clean assembly

Matei

On Jan 21, 2014, at 3:04 AM, Ognen Duzlevski <ognen@nengoiksvelzud.com> wrote:

> On Mon, Jan 20, 2014 at 11:05 PM, Ognen Duzlevski <ognen@nengoiksvelzud.com> wrote:
> 
> Thanks. I will try that but your assumption is that something is failing in an obvious
way with a message. By the look of the spark-shell - just frozen I would say something is
"stuck".  Will report back.
> 
> Given the suspicious nature of the "freezing" of the shell, it looked to me like a timeout
or some kind of a "wait".
> 
> I whipped out tcpdump on a node in the cluster and noticed that the nodes try to connect
back to master on some (random?) port. I realized that my VPC security group was too restrictive.
As soon as I allowed all tcp and udp traffic within the VPC, it magically worked ;)
> 
> So, problem solved. It is not a bug after all, just traffic being blocked.
> 
> In any case, I am documenting this as I go. As soon as I have a viable "data pipeline"
in the VPC I will publish something for everyone to read, I figure another experience wouldn't
hurt.
> 
> Cheers,
> Ognen 


Mime
View raw message