spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tomer Benyamini <tomer....@gmail.com>
Subject Cannot read from s3 using "sc.textFile"
Date Tue, 07 Oct 2014 11:15:22 GMT
Hello,

I'm trying to read from s3 using a simple spark java app:

---------------------

SparkConf sparkConf = new SparkConf().setAppName("TestApp");
sparkConf.setMaster("local");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
sc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "XXXXXX");
sc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "XXXXXX");

String path = "s3://bucket/test/testdata";
JavaRDD<String> textFile = sc.textFile(path);
System.out.println(textFile.count());

---------------------
But getting this error:

org.apache.hadoop.mapred.InvalidInputException: Input path does not
exist: s3://bucket/test/testdata
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:251)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
at org.apache.spark.rdd.RDD.count(RDD.scala:861)
at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:365)
at org.apache.spark.api.java.JavaRDD.count(JavaRDD.scala:29)
....

Looking at the debug log I see that
org.jets3t.service.impl.rest.httpclient.RestS3Service returned 404
error trying to locate the file.

Using a simple java program with
com.amazonaws.services.s3.AmazonS3Client works just fine.

Any idea?

Thanks,
Tomer

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Mime
View raw message