spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "madhukara phatak (JIRA)" <>
Subject [jira] [Commented] (SPARK-4414) SparkContext.wholeTextFiles Doesn't work with S3 Buckets
Date Mon, 16 Mar 2015 07:24:41 GMT


madhukara phatak commented on SPARK-4414:

 Just ran your example on my local machine. Here is the gist
It works fine for me.  Can you test the same?

> SparkContext.wholeTextFiles Doesn't work with S3 Buckets
> --------------------------------------------------------
>                 Key: SPARK-4414
>                 URL:
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.1.0, 1.2.0
>            Reporter: Pedro Rodriguez
>            Priority: Critical
> SparkContext.wholeTextFiles does not read files which SparkContext.textFile can read.
Below are general steps to reproduce, my specific case is following that on a git repo.
> Steps to reproduce.
> 1. Create Amazon S3 bucket, make public with multiple files
> 2. Attempt to read bucket with
> sc.wholeTextFiles("s3n://mybucket/myfile.txt")
> 3. Spark returns the following error, even if the file exists.
> Exception in thread "main" File does not exist: /myfile.txt
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
> 	at org.apache.hadoop.mapreduce.lib.input.CombineFileInputFormat$OneFileInfo.<init>(
> 4. Change the call to
> sc.textFile("s3n://mybucket/myfile.txt")
> and there is no error message, the application should run fine.
> There is a question on StackOverflow as well on this:
> This is link to repo/lines of code. The uncommented call doesn't work, the commented
call works as expected:
> It would be easy to use textFile with a multifile argument, but this should work correctly
for s3 bucket files as well.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message