spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ankur Srivastava <>
Subject Spark GraphFrame ConnectedComponents
Date Thu, 05 Jan 2017 00:40:02 GMT

I am trying to use the ConnectedComponent algorithm of GraphFrames but by
default it needs a checkpoint directory. As I am running my spark cluster
with S3 as the DFS and do not have access to HDFS file system I tried using
a s3 directory as checkpoint directory but I run into below exception:

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS:
s3n://<folder-path>, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(

at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(

If I set checkpoint interval to -1 to avoid checkpointing the driver just
hangs after 3 or 4 iterations.

Is there some way I can set the default FileSystem to S3 for Spark or any
other option?


View raw message