spark-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikolay Skovpin <kolehan...@gmail.com>
Subject Dynamic partitioning weird behavior
Date Tue, 07 Aug 2018 14:47:43 GMT
Hi guys.
I was investigating a spark property
/spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")/. It
works perfectly in local fs, but on s3 i stumbled into a strange behavior.
If i don't have a hive table or this table is empty, spark won't save any
data into this table with SaveMode.Overwrite.
What i did:
import org.apache.spark.sql.{SaveMode, SparkSession}

  val spark = SparkSession.builder()
  .appName("Test for dynamic partitioning")
  .config("spark.sql.sources.partitionOverwriteMode", "dynamic")
  .getOrCreate()
  
 val users = Seq(
     ("11", "Nikolay", "1900", "1"),
     ("12", "Nikolay", "1900", "1"),
     ("13", "Sergey", "1901", "1"),
     ("14", "Jone", "1900", "2"))
     .toDF("user_id", "name","year", "month")

users.write.partitionBy("year",
"month").mode(SaveMode.Overwrite).option("path",
"s3://dynamicPartitioning/users").saveAsTable("test.users")

I can see from logs that spark populates .spark-staging directory with the
data, then spark executes rename command.
But AlterTableRecoverPartitionsCommand shows me a message: /Found 0
partitions, Finished to gather the fast stats for all 0 partitions/. After
that the directory on s3 is empty (except _Sussess flag).
It is ok or a bug?



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Mime
View raw message