spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jey Kottalam <...@cs.berkeley.edu>
Subject Re: Quality of documentation (rant)
Date Mon, 20 Jan 2014 22:59:35 GMT
>> This sounds like either a bug or somehow the S3 library requiring lots of
>> memory to read a block. There isn’t a separate way to run HDFS over S3.
>> Hadoop just has different implementations of “file systems”, one of which is
>> S3. There’s a pointer to these versions at the bottom of
>> http://spark.incubator.apache.org/docs/latest/ec2-scripts.html#accessing-data-in-s3
>> but it is indeed pretty hidden in the docs.
>
>
> Hmmm. Maybe a bug then. If I read a small 600 byte file via the s3n:// uri -
> it works on a spark cluster. If I try a 20GB file it just sits and sits and
> sits frozen. Is there anything I can do to instrument this and figure out
> what is going on?
>

Try taking a look at the stderr log of the executor that failed. You
should hopefully see a more detailed error message there. The stderr
logs can be found by browsing to http://mymaster:8080, where
`mymaster` is the hostname of your Spark master.

Hope that helps,
-Jey

Mime
View raw message