spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Loic DESCOTTE <>
Subject Spark on Kubernetes : unable to write files to HDFS
Date Wed, 16 Dec 2020 09:12:12 GMT

I am using Spark On Kubernetes and I have the following error when I try to write data on
HDFS : "no filesystem for scheme hdfs"

More details :

I am submitting my application with Spark submit like this :

spark-submit --master k8s://https://myK8SMaster:6443 \
--deploy-mode cluster \
--name hello-spark \
--class Hello \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=gradiant/spark:2.4.4 hdfs://hdfs-namenode/user/loic/jars/helloSpark.jar

Then the driver and the 2 executors are created in K8S.

But it fails when I look at the logs of the driver, I see this :

Exception in thread "main" No FileSystem for scheme: hfds
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(
at org.apache.hadoop.fs.FileSystem.createFileSystem(
at org.apache.hadoop.fs.FileSystem.access$200(
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(
at org.apache.hadoop.fs.FileSystem$Cache.get(
at org.apache.hadoop.fs.FileSystem.get(
at org.apache.hadoop.fs.Path.getFileSystem(
at org.apache.spark.sql.execution.datasources.DataSource.planForWritingFileFormat(DataSource.scala:424)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:524)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:290)
at Hello$.main(hello.scala:24)
at Hello.main(hello.scala)

As you can see , my application jar helloSpark.jar file is correctly loaded on HDFS by the
Spark submit, but writing to HDFS fails.

I have also tried to add the hadoop client dand hdfs dependencies in the spark submit command:

--packages org.apache.hadoop:hadoop-client:2.6.5,org.apache.hadoop:hadoop-hdfs:2.6.5 \

But the error is still here.

Here is the Scala code of my application :

import java.util.Calendar

import org.apache.spark.sql.SparkSession

case class Data(singleField: String)

object Hello
    def main(args: Array[String])

        val spark = SparkSession
          .appName("Hello Spark")

        import spark.implicits._

        val now = Calendar.getInstance().getTime().toString
        val data = List(Data(now)).toDF()

Thanks for your help,

View raw message