spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Abhishek Jindal <abhishekjinda...@gmail.com>
Subject [Spark Core] saveAsTextFile is unable to rename a directory using hadoop-azure NativeAzureFileSystem
Date Mon, 13 Sep 2021 16:21:10 GMT
Hello,

I am trying to use the Spark rdd.saveAsTextFile function which calls the
FileSystem.rename() under the hood. This errors out with
“com.microsoft.azure.storage.StorageException: One of the request inputs is
not valid” when using hadoop-azure NativeAzureFileSystem. I have written a
small test program to rename a directory in Azure Blob Storage in Scala
that replicates this issue. Here is my code -

import java.net.URI

import org.apache.commons.lang3.exception.ExceptionUtils
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}

import scala.util.control.NonFatal

/**
  * A utility to test renaming a hadoop-azure path.
  */
object AzureRenameTester {

  def main(args: Array[String]): Unit = {

    if (args.isEmpty) {
      throw new IllegalArgumentException("The Azure Blob storage key
must be provided!")
    }

    val key = args.head
    val hadoopConfig = new Configuration()
    hadoopConfig.set("fs.azure",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
    hadoopConfig.set("fs.wasbs.impl",
"org.apache.hadoop.fs.azure.NativeAzureFileSystem")
    hadoopConfig.set("fs.AbstractFileSystem.wasbs.Impl",
"org.apache.hadoop.fs.azure.Wasbs")
    hadoopConfig.set("fs.azure.account.key.<account>.blob.core.windows.net",
key)

    val input = new
URI("wasbs://<container>@<account>.blob.core.windows.net/testing")
    val inputPath = new Path(input)
    val output = new
URI("wasbs://<container>@<account>.blob.core.windows.net/testingRenamed")
    val outputPath = new Path(output)
    val hadoopFs = FileSystem.get(input, hadoopConfig)

    try {
      println(s"Renaming from $inputPath to $outputPath")
      hadoopFs.rename(inputPath, outputPath)
    } catch {
      case NonFatal(ex) =>
        println(s"${ExceptionUtils.getMessage(ex)}")
        println(s"${ExceptionUtils.getRootCause(ex)}")
        throw ex
    }
  }
}

This code leads to the following error -

[error] Exception in thread "main"
org.apache.hadoop.fs.azure.AzureException:
com.microsoft.azure.storage.StorageException: One of the request
inputs is not valid.
[error] 	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2849)
[error] 	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2721)
[error] 	at org.apache.hadoop.fs.azure.NativeAzureFileSystem$FolderRenamePending.execute(NativeAzureFileSystem.java:460)
[error] 	at org.apache.hadoop.fs.azure.NativeAzureFileSystem.rename(NativeAzureFileSystem.java:3277)
[error] 	at com.qf.util.hdfs.AzureRenameTester$.main(AzureRenameTester.scala:40)
[error] 	at com.qf.util.hdfs.AzureRenameTester.main(AzureRenameTester.scala)
[error] Caused by: com.microsoft.azure.storage.StorageException: One
of the request inputs is not valid.
[error] 	at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:87)
[error] 	at com.microsoft.azure.storage.core.StorageRequest.materializeException(StorageRequest.java:315)
[error] 	at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:185)
[error] 	at com.microsoft.azure.storage.blob.CloudBlob.startCopy(CloudBlob.java:735)
[error] 	at com.microsoft.azure.storage.blob.CloudBlob.startCopy(CloudBlob.java:691)
[error] 	at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobWrapperImpl.startCopyFromBlob(StorageInterfaceImpl.java:434)
[error] 	at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.rename(AzureNativeFileSystemStore.java:2788)
[error] 	... 5 more

I am currently using spark-core-3.1.1.jar with hadoop-azure-3.2.2.jar but
this same issue also occurs in hadoop-azure-3.3.1.jar as well. Please
advise how I should solve this issue.

Thanks,
Abhishek

Mime
View raw message